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CHAPTER  1 


INTRODUCTION  AND  OUTLINE  OF  REPORT 


I . 1  INTRODUCTION 

This  report  describes  the  results  of  a  twelve-month  study  supported 
under  DCA  Contract  100-80-C-0050  of  a  new  speech  digitization  algorithm 
combining  Time-Domain  Harmonic  Scaling  (TDHS)  and  Adaptive  Residual  Coding 
(ARC).  The  FORTRAN  simulation  of  this  system  conducted  as  part  of  this 
study  produces  high  quality  speech  reproduction  at  medium  band  bit  rates  of 
9.6  kb/s  and  16  kb/s.  This  system  also  displays  excellent  robustness 
characteristics  for  channel  bit  error  rates  as  high  as  17  and  for  acoustic 
background  noise.  By  basing  the  required  pitch  extraction  on  a  three-level 
clipped  signal,  the  hardware  requirements  for  the  system  are  modest. 

The  TDHS  algorithm  was  developed  by  Malah  (1979)  and  applied  by  him 
(Malah,  1980)  to  a  CVSD  system  at  a  transmission  rate  of  7.2  kb/s.  More 
recently  Malah  (1981)  has  combined  TDHS  with  Transform  Coding  and  Sub-band 
Coding  at  mediumband  bit  rates.  This  algorithm  consists  of  properly 
weighting  several  adjacent  input  signal  segments  of  pitch  dependent  dura¬ 
tion  by  suitable  window  functions.  As  a  result  of  this,  the  number  of 
samples  to  be  transmitted  can  be  reduced  by  a  factor  of  two.  If  the  bit 
rate  is  kept  the  same,  the  number  of  bits  allowed  per  sample  is  doubled, 
and  the  performance  of  the  coder  can  be  improved  significantly.  The  ARC 
structure  was  developed  by  Cohn  and  Melsa  (1975b)  and  implemented  in  hard¬ 
ware  by  CODEX  (Qureshi  and  Forney,  1975).  This  structure  involves  the 
combination  of  pitch  compensating  adaptive  quantizer  (Cohn  and  Melsa, 
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1976),  sequentially  adaptive  linear  predictor,  and  adaptive  source  coding. 

The  combined  algorithm  has  several  features  which  become  significant 
in  a  full  system  application.  Because  the  algorithm  is  a  high  performance, 
waveform  matching  algorithm,  extremely  good  performance  in  the  tandem  con¬ 
figuration  with  other  algorithms  is  anticipated.  Since  the  technique  i3 
basically  a  waveform  reconstruction  technique,  it  will  perform  well  on  non 
speech  signals  such  as  in-band  signaling  and  modem  tones. 


1.2  ALGORITHM  OBJECTIVE 

The  following  objective  for  the  speech  coding  algorithm  have  been 
established  from  the  Statement  of  Work: 

1.  The  speech  processing  system  shall  operate  at  a  medium  band 
transmission  data  rates  of  9.6  kb/s  and  16  kb/s. 

2.  The  speech  processing  system  shall  produce  toll  quality  speech 
reproduction. 

3.  The  audio  bandwidth  of  the  input  speech  shall  be  greater  than 
or  equal  to  3200  Hz. 

4.  The  speech  coder  shall  produce  good  quality  speech  under  condi¬ 
tions  of  a  random  transmission  bit  error  rate  of  1  percent. 

5.  The  speech  coder  shall  produce  toll  quality  speech  under  60  dB 
(reference  to  20  u  newtons/mete r^)  of  acoustic  background  noise 
such  as  office  noise  and  good  quality  speech  under  100  dB  of 
acoustic  background  noise. 

6.  The  computational  complexity  of  the  algorithm  shall  be 
minimized. 

7.  The  system  shall  be  capable  of  processing  non-speech  signals, 
such  as  in-band  signaling  and  modem  tones. 

8.  Tandem  connection  with  other  algorithms  such  as  CVSD  and  LPC-10 
should  cause  negligible  distortion. 
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1.3  OUTLINE  OF  THE  REPORT 

The  design  and  development  of  TDHS-ARC  algorithm  is  described  in 
three  chapters.  Chapter  2  contains  the  research  work  pertaining  to  Time 
Domain  Harmonic  Scaling.  The  recent  development  of  TDHS  is  described  to 
provide  the  necessary  background  material.  The  research  problems  such 
as  sampling  rate,  window  design,  compression  ratio  and  pitch  extraction 
are  addressed  in  this  chapter.  The  design  of  an  Adaptive  Residual  Coder 
for  the 

frequency  compressed  speech  signal  is  outlined  in  Chapter  3.  Vari¬ 
ous  objective  performance  measure  criteria  and  a  new  fixed  wordlength 
source  code  are  also  described  in  this  chapter.  The  complete  system 
structure  is  presented  in  Chapter  4.  The  effect  of  transmission  errors 
and  background  noise  on  the  system  performance  is  given.  The  strategy 
to  control  the  buffer  behaviour  and  the  scheme  of  extracting  pitch  at 
the  receiver  are  also  discussed  in  this  chapter.  Chapter  5  describes  the 
modifications  of  the  algorithm  which  are  needed  to  operate  at  16  kb/s. 

The  Fourier  Transform  of  several  TDHS  window  functions  are  given  in 
Appendix  A.  The  flow  charts  for  the  FORTRAN  simulation  are  given  in 
Appendix  B  while  the  source  listings  are  given  in  Appendix  C. 


CHAPTER  2 


TIME  DOMAIN  HARMONIC  SCALING 


2.1  INTRODUCTION 

In  Chapter  l,  it  was  indicated  that  the  system  uses  time  domain  har¬ 
monic  scaling  (TDHS)  to  reduce  the  number  of  speech  samples  to  be  trans¬ 
mitted  without  causing  excessive  distortion.  This  process  also  allows  an 
increased  number  of  bits  per  sample  to  be  available  for  coding.  The  TDHS 
algorithm  uses  a  pitch  adaptive  window  to  perform  frequency  compression  or 
expansion  on  the  speech  signal.  Such  frequency  scaling  operations  are  de¬ 
pendent  on  various  factors  such  as  the  type  of  the  window  and  the  pitch 
extraction  method  used  and  the  amount  of  frequency  compression  employed. 

In  this  chapter,  these  and  other  factors  affecting  TDHS  performance  are 
discussed  and  results  are  presented. 

2.2  FREQUENCY  COMPRESSION  AND  EXPANSION 

The  time  varying  Fourier  representation  of  speech  has  successfully 
been  used  in  vocoders.  The  techniques  used  for  frequency  scaling  in  these 
vocoders  are  fairly  complex  and,  therefore,  have  not  been  extended  to 
raediumband  speech  coders.  Recently,  a  time-domain  algorithm  for  frequency 
scaling  was  developed  by  Malah  [April,  1979]  and  applied  by  him  [April, 
1980]  to  CVSD  system  at  a  transmission  rate  of  7200  bits/  second. 

The  algorithm  is  quite  general  and  involves  choices  of  such  param¬ 
eters  as  windowing  function  and  scaling  factors.  One  specific,  and  most 
common,  form  of  the  the  algorithm  is  presented  below  using  triangular 


5 


6 


windows  and  2  to  1  scaling.  The  TDHS  algorithm  makes  use  of  the  long¬ 
term  pitch  redundancy  of  speech  signals  in  a  manner  that  is  similar  to 
gapped  analysis  [Melsa,  et.  al.,  1980].  However,  by  a  clever  choice  of 
the  time-domain  windowing  function,  the  TDHS  algorithm  is  able  to  ensure 
continuity  across  the  frame  boundaries.  At  the  transmitter,  the  basic 
concept  is  to  compress  two  pitch  periods  of  speech  into  a  single  pitch 
period  of  the  same  time  duration  but  at  half  the  sampling  rate.  At  the 
receiver,  the  compressed  signal  is  frequency  multiplied  to  reconstitute 
an  approximation  of  the  original  input  signal.  Consider  first  the  fre¬ 
quency  compression  operation. 

Suppose  that  speech  samples  up  to  sample  number  kg  have  been  pro¬ 
cessed;  the  corresponding  output  sample  number  is  mg  ”  kg/2.  The  first 
step  is  to  determine  the  pitch  period  associated  with  the  samples  fol¬ 
lowing  kg  by  any  standard  method  such  as  correlation  or  AMDF.  Let  the 
resulting  pitch  period,  in  samples,  be  Np.  The  value  of  Np  during  un¬ 
voiced  speech  or  silence  can  be  set  arbitrarily.  Consider  the  2Np 
samples  from  kg+1  to  kg+2Np  as  shown  in  Fig.  2.1.  Note  that  these 
samples  need  not  be  pitch  synchronous.  These  2Np  samples  are  frequency 
compressed  into  Np  samples  by  use  of  the  following  TDHS  algorithm  where 
y(m)  is  the  compressed  output  and  s(k)  is  the  original  speech. 

y(mg+i)  *  s(kg+i)  h(i:Np)  +  s(kg+Np+i)  [1  -  h(i:Np)] 

i  -  1 ,2 , . . . ,Np  (2.1) 

r  1  -  ( i— 1 ) / (Np-1)  1  <  i  <  Np 

Here  h(i:Np)  ■  < 

V,  0  otherwise 

Equation  (2.1)  can  be  rewritten  as 


y(mg+i)  *  s(kg+Np+i)  +  h( i : Np) [s(kg+i)  -  s(kg+Np+i) ]  (2.2) 

to  indicate  that  only  one  multiplication  and  two  additions  are  required 
per  output  sample.  The  frequency  compression  operation  is  illustrated 
in  Fig.  2.1. 

As  long  as  the  window  function  h(i:Np)  satisfies  the  properties 


h(l:Np)  -  1 
h(Np:JJp)  -  0 

the  following  continuity  conditions  will  be  satisfied 

y(rng)  *  s(kg) 
y(mg+l)  *  s(kg+l) 

and 

y(mo+Np)  m  s(kg+2Np) 
y(mg+Np+l)  =  s(k0+2Np+l) 

At  the  receiver,  it  is  necessary  to  use  a  frequency  multiplication 
procedure  to  regenerate  the  2Np  samples  from  the  Np  samples  of  y(m). 
Using  the  TDHS  algorithm  this  is  accomplished  as 

s(ko+i)  3  y(mg+i)  h(i:2Np)  +  y(mg-Np+i)[l  -  h(i:2Np)] 

i  -  1,2,3 . 2N_ 


(2.3) 


(2. A) 


(2.5) 


(2.6) 


The  frequency  multiplication  operation  is  illustrated  in  Fig.  2.2.  Once 
again  if  the  window  function  satisfies  Eq.  (2.3),  continuity  will  be 
ensured  across  the  frame  boundary. 

The  frequency  spectrum  of  the  original  speech,  compressed  speech  and 


expanded  speech  is  shown  in  Figs.  2.3  &  2.4.  The  plot  is  for  20  msec  of 
voiced  speech.  It  can  be  seen  chat  frequency  spectrum  of  original  and 
expanded  speech  match  very  well. 
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Fig.  2.4  Frequency  Spectrum 
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2 . 3  SAMPLING  RATE 

Samples  of  analog  signals  are  a  unique  representation  if  the  analog 
signal  is  bandlimited  and  if  the  sampling  rate  is  more  than  twice  the 
Nyquist  frequency.  Speech  signals  are  not  inherently  bandlimited,  al¬ 
though  the  spectrum  does  fall  off  rapidly  at  high  frequencies.  For  voiced 
sounds,  the  high  frequencies  are  more  than  40  db  below  the  peak  of  the 
spectrum  for  frequencies  above  4  kHz.  On  the  other  hand  for  unvoiced 
sounds,  the  spectrum  does  not  fall  off  appreciably  even  above  8  kHz. 
However,  telephone  transmission  has  a  bandlimiting  effect  on  speech  sig¬ 
nals  and  the  maximum  frequency  in  speech  signals  can  be  considered  as  3.2- 
3.5  kHz  for  conversational  or  "telephone  quality"  speech. 

TDHS  algorithms  are  based  on  the  assumption  that  the  fundamental 
frequency  F0  (the  pitch)  of  the  input  voiced-speech  signal  is  known.  If 
estimated  pitch  frequency  is  Fp  then  error  in  frequency  estimate  is  Fp- 
F0 .  This  error  in  pitch  estimation  an  be  tolerated  [Malah,  1979]  if 


|Fp  -  F0I 


<  — 
2L 


where  L  »  number  of  harmonics  present  in  bandlimited  periodic  input  sig¬ 
nal.  It  is  obvious  that  accuracy  in  the  determination  of  the  pitch  period 
is  important.  Since  a  pitch  estimator  extracts  pitch  in  terras  of  integer 
number  of  samples,  the  accuracy  of  the  pitch  extracted  depends  on  the  sam¬ 
pling  frequency  of  the  input  speech  signal.  As  the  sampling  frequency  is 
increased,  the  pitch  period  resolution  is  improved.  However,  by  increas¬ 
ing  the  sampling  rate  or  oversampling  the  speech  signal,  fewer  bits  per 
sample  are  available  for  coding  the  quantizer  levels.  For  example,  if  the 
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sampling  rate  is  6400  samples/sec  (3200  samples/sec  for  compressed  speech) 
and  the  transmission  rate  is  9600  bits/sec,  the  average  number  of  bits  per 
samples  is  3.  For  a  sampling  rate  of  10000  samples/sec,  the  pitch  period 
estimation  becomes  36%  more  accurate  while  the  average  entropy  allowed 
drops  down  from  3  bits  to  1.92  bits/sample.  Hence,  there  is  a  trade  off 
between  the  improvement  in  performance  due  to  increased  pitch  accuracy  and 
the  degradation  due  to  the  decrease  in  entropy. 

To  study  this  trade-off,  input  speech  was  sampled  at  3  different 
sampling  frequencies,  namely:  6.4,  8  and  9.6  kHz.  First  only  frequency 
compression  and  expansion  operations  were  considered.  Informal  listening 
tests  have  shown  that  unvoiced  (higher  frequency)  speech  sounds  much  bet¬ 
ter  for  higher  sampling  rates  than  lower  ones.  However,  overall  speech 
quality  does  not  differ  significantly.  When  quantization  was  introduced 
(TDHS-ARC  System),  no  significant  change  in  quality  was  noticed  for  the 
different  sampling  rates.  As  indicated  earlier,  fewer  bits  per  sample  are 
available  for  the  higher  sampling  rates.  This  results  in  more  quantiza- 
t ion  noise  which  masks  the  improvement  obtained  in  the  unvoiced  sound  by 
higher  sampling  rates.  The  sampling  frequency  for  the  system  with  trans¬ 
mission  rate  of  9.6  kb/s  was  chosen  to  be  6400  Hz.  With  more  bits  per 
sample  available  for  coding  in  16  kb/s  System,  a  sampling  frequency  of  8 
kHz  may  be  a  good  choice. 

2.4  WINDOW  DESIGN 

To  determine  the  proper  window  function  to  be  used,  the  requirements 
and  the  constraints  that  should  be  satisfied  by  this  function  need  to  be 


discussed. 
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As  mentioned  earlier,  the  TDHS  algorithm  consists  of  properly  weighing 


several  adjacent  input  signal  segments  (with  pitch  dependent  duration)  by  a 


suitable  window  function  to  produce  an  output  segment.  Since  the  pitch 


period  Nb  varies,  it  is  necessary  that  adjacent  segments  processed  with 


different  values  of  Np  should  maintain  output  signal  continuity  at  the 


interface  between  segments.  This  could  be  written  in  equation  form  as 


y(mg)  ■  s(kg) 


y(mg+l)  a  s(kg+l) 


y(mg+Np)  -  s(kg+2Np) 


y(mg+Np+l)  “  s(kg+2Np+l) 


where  kg  is  the  sample  number  up  to  which  speech  samples  are 
processed 


mg*kg/2  corresponding  output  sample  number 


From  Eq.  (2.1)  it  is  known  that 


y(mg+i)  *  s(kg+i)  h ( i : Np )  +  s(kg+Np+i)[l  -  h(i:Mp)] 


i  -  1,2,. ...N_ 


for  i  ■  1 


y(mO+l)  -  s(kg+l)  h(l:Np)  +  s(kg+Np+l)[l  -  h( 1 :Np) ] 


for  i  *  Nr 


y(mg-*-Np)  -  s(kg+Np)  h(Np:Np)  +  s(kg+2Np)[l  -  h(Np:Np)]  (2.10) 
To  satisfy  the  continuity  conditions  in  Eqs.  (2.7)  and  (2.8), 


h(l :Np)  -  1 


h(Np:Np)  -  0 


Another  constraint  is  imposed  by  the  fact  that  if  this  algorithm  is 


used  for  periodic  signals,  then  exact  frequency  scaling  should  be  ob¬ 


tained.  If  the  signal  has  period  Np  and  h(n:Np)  is  the  window  function, 


I 


Triangular  Cosine  Hanning 


Fig.  2.5  Different  window 
functions . 
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frequency  division  by  two  can  be  achieved  as  follows 

y(n)  ■  s(n)  h(n:Np)  +  s(n+Np)  h(n+Np:Np) 

n  -  1,2,3, ...,Np 

Since  s(n)  is  periodic 

s(n)  ■  s(n+Np) 

hence,  y(n)  *  s(n)[h(n:Np)  +  h(n+Np:Np)]. 

For  exact  frequency  division  by  two,  it  is  therefore  necessary  that  h(n:Np) 
sat isfy 

h(n:Np)  +  h(n+Np:Np)  =  1  (2.11) 

or 

h(n+Np:Np)  =  1  -h(n:Np) 

There  are  various  types  of  window  functions  possible  which  satisfy  above 
constraints  and  hence  could  be  used  in  TDHS  algorithm.  Some  of  the  possible 
window  functions  are  shown  in  Fig.  2,5. 

The  choice  of  window  depends  upon  the  simplicity  of  implementation,  the 
number  of  computation  required  and  the  performance.  The  performance  of  a 
particular  window  is  measured  in  terms  of  the  quality  of  output  speech  pro¬ 
duced  with  that  window  choice.  The  best  method  of  measuring  the  quality  of 
output  speech  is  by  listening  to  it.  However,  this  criterion  is  subjective 
and  besides,  very  time  consuming  to  use.  The  other  criteria  which  are  fre¬ 
quently  used  for  measuring  the  quality  of  speech  are  segmental  signal- 
to-noise  ratio  (SEGSNR)  in  the  time  and  frequency  domain.  The  SEGSNR  in  the 
time  domain  is  the  average  of  SNRs  calculated  for  all  the  segments  of 
speech.  The  typical  length  of  the  segment  is  20  msec.  The  SEGSNR  in  the 
frequency  domain  is  calculated  in  a  similar  way  except  the  SNRs  are  computed 
for  the  frequency  components  of  speech  samples.  The  frequency  components 
are  obtained  by  talcing  a  DFT  of  the  input  and  output  speech  segments. 


I 
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It  was  found  that  SEGSNR  in  the  frequency  domain  very  closely  reflects  the 
quality  of  output  speech  and  thus,  could  form  a  good  objective  measure  for 
this  system.  The  details  are  discussed  in  Chapter  3.  Table  2.1  shows 
various  window  functions  and  their  performance.  The  frequency  response  of 
these  functions  are  outlined  in  Appendix  A.  It  can  be  seen  from  Table  2.1 
that  the  performance  of  the  TDHS  algorithm  for  different  window  functions 
and  obtained  for  a  two  second  male  utterance  is  almost  the  same.  However, 
the  complexity  of  these  windows  vary  considerably.  For  example,  the  tri¬ 
angular  window  is  very  simple  to  implement  while  its  performance  is  slight¬ 
ly  worse  than  Hanning  window  which  requires  more  computations. 

Figure  2.6  shows  the  TDHS  output  speech  plot  for  the  word,  "CATS”  for 
different  types  of  window  functions.  For  the  trapezoidal  and  Tukey  win¬ 
dows  the  energy  fluctuations  in  transition  regions  are  more  accurately  re¬ 
constructed  than  the  rest.  The  trapezoidal  window  function  could  be  an 
attractive  alternative  to  triangular  window  function  since  further  savings 
in  multiplication  operations  could  be  achieved.  This  is  demonstrated  as  j 

follows.  From  Table  2.1,  the  trapezoidal  window  function  is 
h(n:Np)  -  1  1  <  n  <  Np/2 

h(n: Np)  -  2  -  2n/Np  Np/2+1  <  n  <  Np 

and  from  Eq.  (2.9),  the  frequency  compression  operation  is  given  by  t 

y(n)  ■  s(n+Np)  +  h(n: Np) [ s(n)-s(n+Np) ]  ; 

for  n  -  l,2,...,Np 

? 

For  trapezoidal  window  function,  this  reduces  to  ' 

y(n)  ■  s(n)  1  <  n  <  Np/2 

T 

y(n)  *  s(n+Np)  +  h(n:Np) [s(n)-s(n+Np) J  | 

Np/ 2+1  <  n  <  Np 


Hamming  0.54  +  0.46  Cos 


Original  speech 


— -v,  wW'/kA^V^AJ 


Triangular  window 


vV-~ 


Trapezoidal  window 


,  v  'j'j  ■  \  va| 


Papoulis  window 
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From  Che  comparisons  of  Che  above  equacions,  ic  can  be  seen  chaC  ic  requires 
Np/2  muleiplicacions  and  Np  addicions  Co  produce  Np  compressed  speech  sam¬ 
ples  for  Crapezoidal  window  as  compared  Co  Np  muleiplicacions  and  2Np  addi¬ 
cions  for  Criangular  window. 

The  quality  of  ouCpuC  speech  generated  by  using  eicher  Che  Criangular 
or  che  Crapezoidal  window  is  almosc  Che  same.  Therefore,  Che  choice  depends 
mainly  upon  Che  simpliciCy  of  implemencac ion  in  hardware. 

2.5  COMPRESSION  RATIO 

In  previous  seccions,  a  compression  faccor  of  2  was  considered.  How¬ 
ever,  ocher  compression  racios  are  possible.  As  Che  value  of  Chis  racio  in¬ 
creases,  more  interharmonic  aliasing  of  pitch  harmonics  results.  Such  dis¬ 
tortion  could  be  tolerated  in  certain  applications.  In  speech  communica¬ 
tion,  speech  quality  is  important  and  therefore,  spectral  distortions  need 
Co  be  kept  small.  A  compression  ratio  of  2:1  is  acceptable  in  this  study. 
However,  the  distortion  caused  by  compression  and  expansion  process  can  be 
reduced  by  employing  3:2  compression. 

Fig.  2.7  shows  how  three  pitch  periods  can  be  compressed  into  two  and 
expanded  back  into  three.  These  operations  can  be  put  into  equation  form  as 
follows.  The  frequency  compressed  speech,  y(n)  is  given  by 
y(n)  *  s(n)  h(n:2Np)  +  s(n+Np) [ l-h(n: 2Np) ] 

for  n  *  1,2,. ...2Np  (2.12) 

or 

y(n)  ■  s(n+Np)  +  h(n: 2Np) [ s(n)-s(n+Np) ] 

n  -  1,2,. ...2Np  (2.13) 

where  s(n)  is  the  original  speech  samples,  Np  is  the  pitch  period  expressed 
in  terms  of  number  of  samples  and  h(n:2Np)  is  the  window  function  given  by 
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(n-1) 

h(n: 2N_)  -  1 -  1  <  n  <  2Nd  (2.14) 

P  2Np-l  P 

The  frequency  expansion  operation,  with  the  ratio  of  2:3,  on  speech  is 
performed  similar  to  1:2  as  discussed  earlier  except  for  different  window 
function  and  is  given  by 

s(n)  *  y(n) { l-h(n: 3Np) ]  +  y(n+Np)  h(n:3Np) 
or 

s(n)  “  y(n)  +  h(n:3Np) [y(n+Np)-y(n) ) 

n  «  1,2, .. . , 3Np  (2.15) 

where 

(n-1) 

h(n:3ND)  *  1 -  1  <  n  <  3Nd  (2.16) 

P  3Np-l  P 

Frequency  scaling  operations  discussed  above  were  simulated.  The  dis¬ 
tortion  found  in  the  output  speech  was  less  than  with  the  2:1  compression 
scheme,  as  anticipated.  This  is  evident  from  the  results  listed  in  Table 
2.2. 


TABLE  2.2 

Comparison  of  2:1  and  3:2  compression  ratio  scheme. 

Sentence  1,  Male  speaker,  Block  size  ■  80  samples  searching  range:  20<T<100 


Type  of 
Window 


SEGSNR  (Time  domain)  SEGSNR  (Frequency  domain) 

Compression  Compression  Compression  Compression 

Ratio  2:1  Ratio  3:2  Ratio  2:1  Ratio  3:2 


Triangular 

8.06 

dB 

9.34 

dB 

11.39 

dB 

13.41 

dB 

Hanning 

8.42 

dB 

10.31 

dB 

11.88 

dB 

15.01 

dB 

Tukey 

7.30 

dB 

9.50 

dB 

11.24 

dB 

16.50 

dB 

Trapezoidal 

7.48 

dB 

9.43 

dB 

11.28 

dB 

15.64 

dB 
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The  increase,  in  segmental  SNR  in  frequency  domain,  as  high  as  5  dB 
could  be  obtained  by  employing  3:2  compression  scheme.  The  increase  in 
SEGSNR  ususally  indicates  the  improvement  in  speech  quality.  However, 
correlation  between  the  extent  of  such  improvement  and  the  increase  in 
SEGSNR  is  not  known  and  is  a  separate  topic  of  research  [Barnwell,  1979). 

Although  this  scheme  looks  promising,  it  has  certain  drawbacks.  The 
number  of  bits  per  sample  available  for  quantization  is  reduced  signifi¬ 
cantly.  For  example,  for  the  bit  rate  of  9.6  kbs  and  sampling  rate  of  6.4 
kHz,  the  bits  available  per  sample  are  reduced  from  3  to  2.25  for  3:2  com¬ 
pression  ratio  scheme.  Such  reduction  in  available  entropy,  not  only 

leads  to  more  quantization  noise,  but  makes  fewer  bits  available  for  error 
protection,  thus  making  the  system  more  susceptible  to  channel  noise. 

For  robust  9.6  kbs  system,  a  compression  ratio  of  2:1  was  thought  to 

be  a  good  choice.  However,  for  higher  bit  rates  and/or  less  noisy  chan¬ 
nel,  a  compression  scheme  of  3:2  would  be  an  attractive  choice.  Should 
the  pitch  periods  (Np)  form  the  side  information,  a  2:1  scheme  would  re¬ 
quire  transmitting  the  pitch  information  every  2Np  samples  as  against  3Np 
samples  in  a  3:2  scheme.  Therefore,  a  3:2  scheme  would  require  33Z  less 
bit  rate  to  transmit  the  pitch  than  for  a  2:1  scheme. 

2 . 6  PITCH  EXTRACTION 

In  earlier  sections,  it  was  shown  that  the  TDHS  algorithm  consists  of 
properly  weighing  several  adjacent  input  signal  segments  with  pitch  de¬ 
pendent  duration  by  a  suitable  window  function,  to  produce  an  output  seg¬ 
ment.  In  the  frequency  domain,  the  time-domain  operations  are  equivalent 
to  shifting  the  individual  pitch  harmonics  of  the  quasi-per iodic  voiced- 
speech  signal  according  to  the  center  frequency  of  the  subband  in  which 
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each  harmonic  component  is  located.  The  number  of  subbanda  into  which 
the  speech  band  is  divided  is  pitch  dependent.  The  pitch  adaptive 
nature  of  the  algorithm  requires  a  pitch  extraction  operation  in  the 
system.  The  choice  of  a  method  to  be  used  for  extraction  depends  on 
how  accurate  the  pitch  estimation  needs  to  be. 

Since  the  TDHS  algorithm  is  pitch  adaptive,  an  error  in  the  pitch 
estimation  may  cause  a  distortion  in  the  output  speech.  Malah,  in  his 
recent  work  [1981],  has  given  the  upperbound  for  an  error  in  the  pitch 
period  estimation  for  the  exact  reconstruction  of  the  signal  harmonics 
after  the  frequency  compression  followed  by  the  frequency  expansion. 

The  upperbound,  given  by  Malah,  is  inversely  proportional  to  the  com¬ 
pression  ratio  and  the  maximum  frequency  to  be  reconstructed  exactly. 

For  fairly  good  quality  speech  reproduction,  at  least  the  second  formant 
frequency  of  the  voiced  speech  should  be  reconstructed  correctly.  This 
fact  can  crudely  be  verified  by  listening  to  the  speech  which  is  passed 
through  a  low-pass  filter  with  different  cutoff  frequencies.  The  second 
formant  frequencies  for  most  of  the  vowels  are  located  below  1.5  to  2 
KHz  [Rabiner  and  Schafer,  1978].  With  the  compression  ratio  of  2,  the 
allowed  pitch-period  error  for  the  above  frequencies  become  0.16  and 
0.125  msec  respectively.  With  sampling  rate  of  6400  Hz,  this  is  equiva¬ 
lent  to  0.8  to  1  sample  error.  In  the  simulation  studies,  it  was 
noticed  that  a  pitch  error  of  twice  this  amount  could  be  tolerated  which 
confirms  with  Malah's  observation  [1981]. 

Several  pitch  extraction  techniques  exist  in  the  literature  [Gold  4 
Rabiner  1969;  Ross,  et.  al  1974;  Sondhi  1968;  Rabiner  1977].  Many  of 
these  techniques  have  some  form  of  a  logic  for  making  a  voiced/unvoiced 
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(V/UV)  decision  as  well  as  for  the  pitch  data  smoothing.  The  algorithm 
presented  here  does  not  need  such  a  complex  technique  and  therefore,  only 
simple  techniques  will  be  studied.  Two  such  techniques  are  autocorrela¬ 
tion  [Rabiner,  1977]  and  AMDF  [Ross,  et.  al  1974].  The  pitch  was  estimat¬ 
ed  by  using  the  above  methods  for  original  speech,  center  clipped  speech 
[Sondhi,  1968],  3-value  center  clipped  speech  [Dubnowski,  1976]  and  2- 
value  clipped  speech.  The  autocorrelation  and  AMDF  methods,  combined  with 
the  four  different  types  of  speech  input,  form  essentially  eight  different 
techniques.  The  methods  and  the  results  are  discussed  in  the  following 
paragraphs . 

Simple  methods  such  as  autocorrelation  and  AMDF,  were  tried  for  pitch 
estimation.  These  methods  estimate  the  pitch  periods  quite  accurately  in 
the  voiced  segments  of  speech,  except  for  double  or  triple  pitch  picking. 
The  double  pitch-period  corresponds  to  performing  the  filter  bank  analysis 
with  twice  as  many  filters.  This  means  that  only  every  other  filter  con¬ 
tains  a  pitch  harmonic.  This  algorithm  does  not  need  any  voiced-unvoiced 
decision.  All  these  reasons  made  it  possible  to  use  a  simple  pitch  ex¬ 
traction  method. 

The  autocorrelation  method  involves  forming  the  short  time  autocor¬ 
relation  function  as  in  Eq .  (2.17) 

N-l 

R(i)  -  l  s(m)  s(mt-4)  Tmin  <  l  <  Tmax  (2.17) 

m“o 

The  autocorrelation  function  representation  of  the  signal  is  a  con¬ 
venient  way  of  displaying  certain  properties  of  the  signal.  For  example, 
if  the  signal  is  periodic  with  period  equal  to  T  samples,  it  can  be  shown 


that 
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RU)  -  RU*T) 

and  R(£)  attains  its  maximum  value  at  1  *  0,  ±T,  ±2T,  .  .  .  The  pitch 
determination  involves  computing  R (l)  as  in  Eq.  (2.17)  for  different  lags 
i  and  locating  the  maximum.  To  check  the  periodicity  of  the  waveform  one 
needs  to  check  at  least  two  periods.  Therefore,  the  searching  range  for  l 
is  of  the  order  of  two  pitch  periods.  The  blocksize  N  should  be  chosen  to 
give  a  good  indication  of  the  changing  properties  of  the  speech  signal.  A 
block  size  on  the  order  of  a  pitch  period  was  found  to  be  a  good  choice. 

The  above  method,  though  simple  to  implement,  involves  extensive 
computations.  For  example,  if  the  block  length  is  N  and  the  searching 
range  is  r,  then  for  each  value  of  r  there  are  N  multiplications  and  N 
additions.  Hence  the  total  number  of  multiplications  for  finding  the 
pitch  period  becomes  N*r;  if  r«*150  and  N*80  then  this  number  is  12000. 

This  is  just  for  one  block  of  samples.  This  many  multiplications  consumes 
significant  processing  time  which  may  cause  problem  in  a  real-time  imple¬ 
mentation.  This  led  to  a  search  for  another  technique  which  is  computa¬ 
tionally  simpler  and  yet  provides  accurate  results. 

This  is  possible  by  the  use  of  Average  Magnitude  Difference  Function 
(AMDF).  This  technique  is  based  upon  the  idea  that  for  a  truly  periodic 
input  of  period  T,  the  sequence 

d(n)  ■  s(n+k)  -  s(n) 

would  be  zero  for  k  ■  0,  ±T,  ±2T.  .  .  For  a  short-segment  of  voiced 
speech,  it  is  reasonable  to  expect  that  d(n)  will  be  small  at  multiples 
of  the  period,  but  not  identically  zero.  The  short-time  average  magni¬ 
tude  of  d(n)  as  a  function  of  k  should  be  small  whenever  k  is  close  to 
the  period.  The  short-time  AMDF  [Ross,  et.  al,  1974]  is  thus  defined  as 


(2.18) 


N-l 

A(k)  ■  £  |s(n+k)  -  s ( n ) ) 

n»0 

Tmin  *  k  4  Tmax 

The  considerations  for  Che  choice  of  blocksize  and  the  searching  range  is 
the  same  as  discussed  above.  The  AMDF  function  is  implemented  with  sub¬ 
traction,  addition  and  absolute  value  operations,  in  contrast  to  addition 
and  multiplication  operations  for  the  autocorrelation  function.  With 
floating  point  arithmetic,  where  multiplies  and  adds  take  approximately 
the  same  time,  about  the  same  time  is  required  for  either  method  with  the 
same  window  length.  However,  for  special  purpose  hardware,  or  with  fixed 
point  arithmetic,  the  AMDF  appears  to  have  an  advantage.  In  this  case, 
multiplies  usually  are  more  time  consuming  and  furthermore  either  scaling 
or  a  double  precision  accumulator  is  required  to  hold  the  sum  of  lagged 
products. 

One  of  the  major  limitations  of  the  autocorrelation  representation  is 
that  in  a  sense  it  retains  too  much  of  the  information  in  the  speech  sig¬ 
nal.  As  a  result,  Che  autocorrelation  function  has  many  peaks.  Most  of 
these  peaks  can  be  attributed  to  the  damped  oscillations  of  the  vocal 
tract  response  which  are  responsible  for  the  shape  of  each  period  of 
speech  wave.  Usually  the  peak  at  the  pitch  period  has  the  greatest  ampli¬ 
tude  (smallest  in  case  of  AMDF).  However,  rapidly  changing  format  fre¬ 
quencies  can  cause  bigger  autocorrelation  peaks  than  those  due  to  the 
periodicity  of  the  vocal  excitation.  In  such  cases,  the  simple  procedure 
of  picking  the  largest  peak  in  autocorrelation  will  fail. 


To  avoid  this  problem  it  is  again  useful  to  process  the  speech  sig¬ 
nal  so  as  to  make  the  periodicity  more  prominent  while  suppressing  other 
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distracting  features  of  the  signal.  Techniques  which  perform  this  type  of 
operation  on  a  signal  are  sometimes  called  "spectrum  flatteners"  since 
their  objective  is  to  remove  the  effects  of  the  vocal  tract  transfer  func¬ 
tion,  thereby  bringing  each  harmonic  to  the  same  amplitude  level  as  in  the 
case  of  a  periodic  impulse  train.  Numerous  spectrum  flattening  techniques 
have  been  proposed  [Sondhi,  1963];  one  technique  is  called  "center  clip¬ 
ping"  [Sondhi,  1968],  and  is  obtained  by  a  nonlinear  transformation 

y(n)  *  Cfx(n)] 

where  C[  ]  is  shown  in  Fig.  2.8(a).  The  operation  of  center  clipper  is 
depicted  in  Fig.  2.8(b).  From  a  block  of  speech  samples  the  absolute  max¬ 
imum  amplitude  Smax  is  found;  the  clipping  level  Cl  is  set  equal  to  a 
fixed  percentage  of  Smax.  It  can  be  seen  that  for  samples  above  Cl>  Che 
output  of  the  center  clipper  is  equal  to  the  input  minus  the  clipping 
level.  For  samples  below  the  clipping  level,  the  output  is  zero. 

Clearly,  setting  the  clipping  level  is  important.  If  the  clipping 
level  is  large,  only  a  small  number  of  peaks  will  exceed  the  clipping 
level  and  only  a  few  undesirable  pulses  will  occur  in  the  output.  The 
clearest  indication  of  periodicity  is  obtained  for  the  highest  possible 
clipping  level.  There  is,  however,  a  difficulty  with  using  a  too  high  a 
clipping  level.  It  is  possible  that  the  amplitude  of  the  signal  may  vary 
appreciably  across  the  duration  of  the  speech  segment  (e.g.  at  the  begin¬ 
ning  or  end  of  voicing)  so  that  if  the  clipping  level  is  set  at  a  high 
percentage  of  the  maximum  amplitude  across  the  whole  segment,  there  is  a 
possibility  that  much  of  the  waveform  will  fall  below  the  clipping  level 
and  be  lost.  In  the  simulation  studies,  it  was  found  that  such  situation 


Pig.  2.8  (a)  Center  clipping  functi 


Input  speech 


Centered  clipped  speech 
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is  avoided  if  clipping  level  is  kept  around  30%.  This  same  observation 
was  also  made  by  Sondhi  [1968].  If  the  clipping  level  is  more  than  60% 
the  situation  noted  above  usually  does  occur  and  the  estimation  of  the 
pitch  period  is  in  error.  This  is  shown  in  Table  2.3  by  the  arrows  point¬ 
ed  to  the  wrong  pitch  periods.  A  procedure  [Rabiner  and  Schafer,  1978] 
which  permits  a  greater  percentage  (60-80%)  to  be  used  is  to  find  the  peak 
amplitude  in  both  the  first  third  and  the  last  third  of  the  segment  and 
set  the  clipping  level  at  a  fixed  percentage  of  the  minimum  of  these  two 
maximum  levels.  This  procedure  was  incorporated  in  the  pitch  extraction 
algorithm  and  results  are  shown  in  Table  2.3.  The  table  shows  the  analy¬ 
sis  intervals  and  the  corresponding  pitch  periods.  The  analysis  interval 
is  chosen  to  be  200  samples.  The  first  and  the  last  sample  number  of  this 
interval  is  obtained  as  follows.  Suppose  the  first  sample  number  of  an 
analysis  interval  is  n.  The  last  sample  number  of  the  interval  becomes 
n+199  to  make  the  length  equal  to  200.  Let  the  pitch  period  for  the  anal¬ 
ysis  interval  (n  to  n+199)  be  Np.  The  next  interval  becomes  (n+Np  to  n+2Np 
+199).  Since  the  pitch  is  different  for  different  pitch  extraction 
techniques  in  unvoiced  segment  of  speech,  the  analysis  interval  differs. 
This  can  also  happen  due  to  double  or  triple  pitch  picking  in  the  voiced 
speech.  Comparing  the  results  in  Table  2.3  and  it  can  be  seen  that  pitch 
errors  due  to  high  clipping  level  are  corrected  by  using  the  procedure 
outlined  above.  Note  that  the  double  or  triple  pitch  periods  are  not  con¬ 
sidered  pitch  errors  because  they  do  not  degrade  the  performance. 

A  simple  modification  of  the  center  clipping  function  leads  to  a 
great  simplification  in  computation  of  the  autocorrelation  function  with 
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essentially  no  degradation  in  utility  for  pitch  detection  [Rabiner, 
Schafer,  Dubnowski,  1976].  This  modification  is  shown  in  Fig.  2.9.  As 
indicated  there,  the  output  of  the  clipper  is  +1  if  x(n)  >  Cl  and  is  -1 
if  x(n)  <  -Cl-  Otherwise  the  output  is  zero.  Although  this  operation 
tends  to  emphasize  the  importance  of  peaks  that  just  exceed  the  clipping 
level,  the  autocorrelation  function  is  very  similar  to  that  of  the  cen¬ 
ter  clipper  of  Fig.  2.8. 

The  computation  of  the  autocorrelation  function  for  a  3-level  cen¬ 
ter  clipped  signal  is  particularly  simple.  The  product  s(k)s(k+£)  in 
Eq.  (2.17)  can  have  only  three  values. 


° 

if 

s(k) 

■  0  or  s(k+£)  0 

s(k)sk-Ml)  »  ■ 

1  *' 

if 

s(k) 

»  s(k+£) 

l-i 

if 

s(k) 

*  s(k+£ ) 

Thus,  in  hardware  terms,  only  simple  combinatorial  logic  and  an  up-down 
counter  is  required  to  accumulate  the  autocorrelation  value  for  each  value 
of  £.  Likewise,  the  input  to  the  AMDF  algorithm  could  also  be  center 
clipped  speech. 

The  two-level  or  infinite  clipper  shown  in  Fig.  2.10,  was  tried  for  a 
hardware  simplification  to  the  3-level  clipper  described  above.  However, 
the  pitch  period  estimation  was  not  accurate.  The  explanation  could  be 
derived  from  Licklider  and  Pollack's  [1948]  experiment.  They  showed  that, 
whereas  speech  that  has  been  infinitely  peak  clipped  is  highly  intelligi¬ 
ble,  even  a  few  percent  of  center  clipping  drastically  reduces  intelligi¬ 
bility.  The  reason  is  infinite  peak  clipping  retains  the  formants  of  the 
speech  signal  (although  it  introduces  a  few  secondary  "formants"),  center 
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clipping  destroys  the  formant  structure,  while  retaining  the  periodicity. 
It  is  the  removal  of  formant  structure  that  is  so  important  for  a  good 
pitch  extractor. 

Table  2.4  shows  the  output  of  the  pitch  extractor  for  3even  different 
techniques.  The  one  which  uses  three  level  center  clipper  was  found  to  be 
accurate  enough  for  the  desired  applications  and  hence  was  selected  for 
its  hardware  simplicity.  Note  that,  as  discussed  earlier,  the  analysis 
intervals  differ  for  the  different  pitch  estimators.  Hence,  the  care  must 
be  taken  to  compare  the  pitch  period  variations  for  the  entire  segment  of 
voiced  speech  instead  of  one  particular  analysis  interval. 

The  hardware  simplicity  of  the  pitch  detector  becomes  quite  important 
because  of  the  unique  feature  of  this  algorithm.  The  pitch  is  extracted 
at  the  receiver  from  the  reconstructed  compressed  speech.  The  two  pitch 
extractors,  one  at  the  transmitter  and  another  at  the  receiver,  increase 
the  cost  of  the  system  hardware.  However,  for  a  3-level  center  clipped 
autocorrelation  method,  hardware  requirements  are  minimum  and  hence,  the 
hardware  cost  is  marginal.  The  pitch  extraction  at  the  receiver  becomes  a 
little  more  difficult  because  the  input  speech  for  the  pitch  detector  is 
noisy  compressed  speech,  possibly  with  channel  errors.  The  noise  in  the 
reconstructed  compressed  speech  is  the  quantization  noise  which  is  small 
enough  to  not  cause  a  pitch  detection  error.  However,  a  smaller  number  of 
pitch  periods  per  unit  time  are  available  in  the  compressed  speech.  This 
is  the  major  problem  to  extracting  pitch  at  the  receiver.  The  problem 
could  become  serious  for  male  speakers  because  of  the  small  value  of  the 
fundamental  frequency.  The  pitch  extraction  techniques  outlined  above  can 
still  be  used  at  the  receiver.  However,  a  judicious  choice  of  the 
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blocklength  and  Che  searching  range  is  necessary.  At  the  transmitter,  the 
blocklength  of  80  and  searching  range  of  (20,100)  samples  was  found  to  be  a 
good  choice.  At  the  receiver,  the  blocklength  of  50  and  the  same  searching 
range  was  found  to  be  satisfactory.  These  conclusions  are  based  on  the 
results  obtained  for  three  sentences.  A  study  for  several  utterances  might 
be  necessary  to  come  up  with  the  best  value  for  the  blocklength  and  the 
searching  range. 

Pitch  detection  at  the  receiver  avoids  the  transmission  of  pitch  as  a 
side  information,  thus,  making  the  blocking  of  the  quantizer  data  and  the 
block  synchronization  unnecessary.  However,  channel  noise  can  affect  the 
compressed  speech  appreciably  which  inturn  would  cause  wrong  pitch  estima¬ 
tion.  For  a  bit-error-rate  (BER)  of  IX,  the  above  situation  does  occur 
quite  frequently  but  occurs  less  frequently  if  error  protection  is  provided 
for  the  quantizer  levels.  In  the  simulation  studies,  the  speech  quality 
for  the  noisy  channels  with  or  without  pitch  transmission  was  found  to  be 
the  same.  This  may  be  due  to  the  masking  effect  of  the  distortion  due  to 
the  errors  in  the  quantizer  levels  on  the  distortion  caused  by  the  pitch 
errors.  The  results  and  further  discussion  of  this  are  postponed  until 
Chapter  4. 

2 . 7  SUMMARY 

A  time  domain  harmonic  scaling  of  speech  was  found  to  be  an  effective 
approach  to  represent  speech  with  fewer  number  samples  while  maintaining 
excellent  quality.  A  simple  triangular  window  function  was  used  for  com¬ 
pression  and  expansion  operations.  A  compression  ratio  of  2:1  was  found 
to  be  a  good  choice.  A  3-value  center  clipped  autocorrelation  technique 


without  any  decision  logic,  V/UV  decision  or  pitch  data  smoothing  was 
found  to  be  adequate  for  the  purpose.  The  pitch  is  extracted  both  at  the 
transmitter  and  the  receiver.  The  harmonic  distortion  produced  by  TDHS 
operations  could  not  be  noticed  due  to  masking  properties  of  the  ear.  The 
TDHS  output  is  coded  using  an  ADPCM  technique  which  is  described  in  the 
next  chapter. 


\ 
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CHAPTER  3 


ADAPTIVE  RESIDUAL  CODER 


3.1  INTRODUCTION 

In  the  previous  chapter,  the  compression  and  expansion  operations 
performed  at  the  transmitter  and  at  the  receiver  respectively  were  dis¬ 
cussed.  Compressed  speech,  which  consists  of  half  the  number  of  original 
samples  for  2:1  compression  ratio,  needs  to  be  coded  and  transmitted  on 
the  channel.  This  can  be  achieved  by  employing  various  schemes.  Com¬ 
pressed  speech  is  formed  by  using  long-term  redundancy  in  a  manner  dif¬ 
ferent  from  the  APC  technique.  Since  some  of  the  long-term  redundancy  in 
che  original  speech  is  already  removed  in  forming  the  compressed  speech, 
use  of  APC  coder  to  code  it  would  be  inefficient.  However,  compressed 
speech  could  be  effectively  coded  by  exploiting  the  short  terra  redun¬ 
dancy.  This  is  done  by  using  ADPCM  coders. 

The  Adaptive  Residual  Coder  (ARC)  System,  developed  by  Cohn  and 
Melsa,  [Sept.  1975]  is  an  improved  ADPCM  system.  The  following  sections 
describe  the  system  structure  of  ARC  and  outline  its  design  and  perform¬ 
ance  for  the  compressed  speech  as  an  input  signal.  Various  performance 
measures  were  tested  for  the  ARC  system  and  are  also  discussed  in  this 
chapter. 

3.2  THE  RESIDUAL  ENCODER  STRUCTURE 

Fig.  3.1  shows  the  basic  ADPCM  structure  augmented  with  dashed  lines 
to  indicate  the  flow  of  information  for  adaptation  in  the  residual  en¬ 
coder.  The  underlying  design  principle  of  the  adaptation  procedure  is 
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that  all  information  used  in  updating  the  quantizer  and  predictor  be 
available  both  at  the  transmitter  and  the  receiver.  Since  the  only  infor¬ 
mation  sent  from  the  transmitter  to  the  receiver  is  the  quantizer  output 
q^,  all  adaptation  procedures  must  use  quantities  derivable  from  the  q^. 

As  mentioned  earlier,  the  compressed  speech  samples  y^  form  the  input 
to  the  ARC  system.  The  sampling  rate  of  y^  depends  on  the  sampling  fre¬ 
quency  of  original  speech  and  the  compression  ratio  employed  in  harmonic 
scaling  operations.  For  example,  if  the  input  speech  bandwidth  is  3200  Hz 
and  it  is  sampled  at  the  Nyquist  rate,  and  with  a  2:1  compression  ratio, 
then  the  sampling  frequency  of  y^  becomes  3200  Hz  and  the  bandwidth  is 
compressed  into  1600  Hz. 

The  system  shown  in  Fig.  3.1  includes  the  pitch  compensating  adaptive 
quantizer.  The  object  of  an  adaptive  quantizer  is  to  match  the  quantizer 
to  the  local  statistics  of  the  speech  rather  than  to  global  or  "average" 
statistics.  Thus,  if  the  speaker  is  talking  loudly,  a  quantizer  with  a 
wide  range  should  be  used;  but  if  the  speaker  is  talking  quietly,  a  narrow 
range  should  be  used.  In  fact,  the  dynamic  range  of  human  speech  varies 
significantly  from  syllable  to  syllable. 

The  usual  method  of  quantizer  adaptation  is  to  adjust  the  range  of 
the  quantizer  as  a  function  of  the  prior  quantizer  output.  Consider  the 
quantizer  structure  shown  in  Fig.  3.2.  Here  the  input  is  normalized  by  a 
state  variable  and  then  quantized  with  a  fixed  quantizer.  If  the  ex¬ 
pected  range  of  the  input  signal  is  large,  should  be  large;  if  the  ex¬ 
pected  range  is  small,  should  be  small.  If  the  prior  quantizer  input 
was  large,  then  from  the  correlation  between  successive  samples,  one  might 


42 


3. 2  Adaptive  Quantizer 


■** 

i 

.j] 


43 


expect  the  next  input  to  be  large  as  well.  Therefore,  if  the  quantizer 
output  is  large,  should  be  increased.  Similarly,  if  the  quantizer 
output  is  small,  a ^  should  be  decreased. 

Two  different  approaches  have  been  suggested  for  quantizer  adapta¬ 
tion:  forward  adaptation  and  backward  adaptation.  In  a  forward  scheme, 

the  adaptation  decision  is  based  on  unquantized  data  and  is  communicated 
to  the  receiver  as  side  information.  Performance  tests  indicate  (Noll, 
1974)  that  the  forward  method  is  slightly  better  if  no  cost  is  assessed 
for  the  side  information.  Practical  considerations,  however,  seem  to 
favor  backward  adaptation. 

Several  investigators  have  evaluated  adaptive  PCM  and  DPCM  systems 
that  incorporate  adaptive  quantizer  (Cummisky,  et  al.,  1973;  Gibson,  et 
al. ,  1974;  Castellino,  et  al . ,  1974;  Stroh,  1971).  These  systems  have 
consistently  shown  considerable  performance  improvements  over  those  which 
do  not  use  adaptive  quantizers.  The  adaptation  procedures  do  require  a 
mild  increase  in  system  complexity;  some,  however,  can  be  realized  with 
simple  shift-and-add  operations  (Qureshi  and  Forney,  1975). 

Adaptive  quantizers  have  been  the  subject  of  intense  theoretical 
study  (Goodman  and  Gersho,  1974;  Mitra,  1974a;  Mitra,  1975;  Cohn  and 
Melsa,  1975).  One  result  of  significant  intuitive  interest  unites  two 
apparently  different  schools  of  thought  on  adaptation  design.  One 
approach,  used  in  both  backward  and  forward  adaptation,  normalizes  the 
quantizer  input  by  an  estimate  of  its  envelope  or  RMS  level.  The  other 
method,  first  proposed  by  Jayant  (1973),  scales  the  normalization  constant 
according  to  the  prior  output;  if  it  is  an  outer  level,  the  constant  is 


increased,  if  ic  is  an  inner  level,  Che  constant  is  decreased.  Cohn  and 
Melsa  (1975a)  showed  that  if  some  mild  conditions  are  met,  the  Jayant 
method  is  also  an  envelope  estimation. 

Initial  designs  of  Jayant  quantizers  proved  to  be  very  sensitive  to 
channel  errors.  Later  work  (Cohn  and  Melsa,  1975a;  Goodman  and  Wilkinson, 
1975;  Cohn  and  Melsa,  1975b;  Qureshi  and  Forney,  1975)  has  shown  ways  of 
modifying  the  algorithm  to  eliminate  this  problem. 

It  is  clear  then,  that  adaptive  quantizers  improve  performance  be¬ 
cause  they  match  their  range  to  the  dynamic  range  of  the  incoming  signal. 

A  pitch  compensating  quantizer  (Cohn  and  Melsa,  1976)  extends  this  notion 
by  noting  that  during  voiced  speech  the  dynamic  range  of  the  input  is 
critically  dependent  on  the  proximity  of  the  last  pitch  pulse. 

It  is  well  known  that  during  voiced  speech,  the  signal  strength  is  a 
local  maximum  shortly  after  a  pitch  pulse  and  tends  to  decay  towards  the 
end  of  pitch  period.  This  effect  is  even  more  pronounced  in  the  differ¬ 
ence  signal  that  is  quantized  in  DPCM  systems.  Although  the  predictor  may 
closely  approximate  the  actual  speech  away  from  a  pitch  pulse,  it  general¬ 
ly  cannot  predict  the  next  pitch  pulse.  The  design  objective  of  a  pitch 
compensating  quantizer  is  to  adapt  to  both  the  long  term  syllable  varia¬ 
tions  in  signal  strength  and  to  the  short  term  variations. 

The  backward  adapting  pitch  compensation  algorithm  uses  a  quantizer 
whose  outermost  levels  have  been  set  at  higher  values  than  is  normal. 

When  the  quantizer  output  is  one  of  these  outermost  levels,  the  adaptation 
algorithm  reacts  as  if  a  pitch  pulse  had  been  detected;  the  quantizer 
state  variable  is  significantly  increased  and  then  permitted  to  rapidly 
decay  back  to  its  long-term  value. 
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The  pitch  compensating  quantizer  (PCQ)  is  employed  in  this  algo¬ 
rithm.  The  prediction  error  e(k)  is  the  input  to  the  quantizer  whose 
basic  design  is  illustrated  in  Fig.  3.2.  The  recommended  thresholds  are 
symmetric  and  are  illustrated  in  Fig.  3.3  and  listed  in  Table  3.1.  The 
level  in  which  the  normalized  input  falls  specifies  the  quantizer  output 
q(k).  The  inverse  quantizer  output  e(k)  is  the  quantized  version  of  the 
quantizer  input.  It  is  the  product  of  a  scaling  factor  f(q(k))  and  the 
state  variable  o(k).  The  recommended  scale  factors  are  tabulated  in 
Table  3.2.  The  recommended  thresholds  were  computed  to  be  equidistant 
between  between  the  scaling  factors. 

The  state  variable  o(k)  is  designed  to  be  an  approximation  to  the 
standard  deviation  of  e(k) .  Most  of  the  time  the  scaled  average  of  |y(k)| 
is  an  acceptable  estimate.  However,  in  voiced  speech  at  the  beginning  of 
a  pitch  period,  e(k)  is  much  larger  than  usual.  Therefore,  whenever  one 
of  the  outermost  quantizer  level  occurs,  a(k)  decays  back  to  the  scaled 
average  of  |y(k)|.  Thus  a(k)  is  updated  by 

o(k)  ■  maxfSMIN  <|y(k)|>,  <J>[q(k)]  o(k-l)}  (3.1) 

The  first  term  in  the  braces  of  Eq.  (3.1)  usually  dominates.  This  means 
that  quantizer  behavior  is  largely  determined  by  SMIN<|y(k)|>  and,  hence, 
by  the  product  of  SMIN  and  RMSMIN.  It  is  recommended  that  the  scale  fac¬ 
tor  SMIN  be  set  to  0.25.  The  second  term  in  the  braces  only  affects  per¬ 
formance  at  the  beginning  of  pitch  periods.  The  quantizer  expansion  fac¬ 
tors  $[q(k)]  are  given  in  Table  3.2. 

The  choice  of  output  levels,  expansion  factors  and  other  scalars  in 
the  design  of  quantizer  is  largely  governed  by  maintaining  the  average 
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TABLE  3.1 

Quantizer  Thresholds 


0.3 

1.05 

2.55 

5.55 
11.55 


TABLE  3.2 

Quantizer  scaling  and  expansion 
factors 


q(k) 

E[q(k>] 

4>[q(k)  ] 

1 

0.0 

0.8 

2,3 

0.6 

0.9 

4,5 

1.5 

9.0 

6,7 

3.0 

1.5 

8,9 

6.0 

3.0 

10,11 

12.00 

6.0 
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source  entropy  within  acceptable  limits  and  at  the  same  time  reducing  the 
quantization  noise.  Usually,  the  average  number  of  bits  available  for 
coding  quantizer  levels  is  dependent  on  bit  rate  available  and  the  sam¬ 
pling  frequency.  In  this  algorithm,  fixed  length  code  with  run  length  is 
used.  This  requires  that  quantizer  level  occupancy  statistics  be  such 
that  probability  of  run  lengths  is  maintained  within  a  certain  range. 

This  makes  the  average  bit  rate  remain  below  the  allowable  limit.  The 
detail  discussion  may  be  found  in  Chapter  4. 

The  ARC  system  employs  a  sequentially  adaptive  linear  predictor.  It 
produces  a  linear  prediction  p(k)  given  by 

N 

p(k)  -  £  a^Ck)  y(k-i)  (3.2) 

i“l 

which  is  to  be  an  estimate  of  y(k).  The  y(k-i)  are  the  receiver's  esti¬ 
mate  of  y(k-i).  Since  y(k)  is  frequency  compressed  to  half  the  original 
bandwidth,  the  number  of  poles  required  to  represent  compressed  bandwidth 
would  be  half  that  required  for  original  bandwidth.  A  predictor  order  of 
four  was  found  to  be  a  good  choice. 

If  the  a^(k)  accurately  model  the  y(k),  and  if  the  y(k-i)  are  close 
to  the  y(k-i),  then  p(k)  will  be  a  good  approximation  to  y(k) .  The  a^(k) 
are  adaptive,  and  after  p(k)  is  formed,  they  are  updated.  They  are 
adapted  according  to  steepest  descent  of  e^(k)  [Melsa  &  Cohn,  1975] . 

This  is  approximated  in  the  system  by  the  following  updating  algorithm: 

a£(k+l)  =■  6b£  +  (l-6)[ai(k)  +  g  /(k-i)e(k)  (3.3) 

<|y(k)|> 

where  <|y(k)|>  is  a  biased  exponential  time  average  of  |y(k)| 
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<| y(k) | >  -  (1-a)  l  ai|y(k-j)|  +  RMSMIN 

j-o 

Thus,  the  a£(k)  updating  algorithm  has  eight  parameters:  6, g, a, RMSMIN 
and  for  i»l,2,3  and  4.  It  was  found  in  the  simulation  studies  that 
the  effect  of  channel  errors  become  more  severe  if  the  memory  in  the  up¬ 
dating  process  is  increased.  This  increase  is  essentially  controlled  by 
choice  of  parameters  S,q  and  a.  In  order  to  minimize  the  effect  of 
channel  errors,  the  memory  time  was  reduced  from  what  would  be  optimal  in 
error-free  case.  This  did  not  significantly  degrade  performance.  The 
recommended  values  of  these  parameters  are 

6  »  0.05 

g  -  0.02 

a  *  0.93 


The  parameters  b^  represent  the  quiescent  values  of  the  coefficients 
a^(k).  The  vaiues  used  are 


i»l 

i=2,3  or  4 


The  quantity  RMSMIN  is  perhaps  the  most  sensitive  parameter  in  the  algo¬ 
rithm.  It  determines  the  minimum  value  of  <|y(k)|>  which  affects  both 
the  adaptive  predictor  and  the  adaptive  quantizer.  As  the  RMSMIN  de¬ 
creases,  the  system  responds  more  during  low  level  signals.  This  reduces 
granular  noise  and  increases  the  data  rate.  The  higher  data  rate  means 
that  the  buffer  fills  faster  and  that  buffer  control  is  triggered  causing 
deterioration  of  speech  quality.  When  y(k)  is  represented  on  the  inter¬ 
val  [-2048,  2047],  RMSMIN  of  40  produces  a  good  tradeoff. 
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3 . 3  PERFORMANCE 

The  performance  of  Che  TDHS-ARC  system  depends  on  how  well  Che  fre¬ 
quency  scaling  operations  are  performed  and  how  well  Che  quantization  is 
done.  As  mentioned  earlier,  the  ARC  system  is  essentially  an  ADPCM  system 
and  for  Che  given  bit  rate,  its  performance  can  be  improved  by  decreasing 
the  quantization  noise.  However,  a  reduction  in  the  quantization  noise 
does  not  necessarily  mean  a  improvement  in  the  performance.  Such  situa¬ 
tions  will  be  discussed  later  on  in  this  section. 

The  bit  rate  available  for  coding  quantizer  levels  depends  on  the  bit 
rate  needed  for  error  protection  and  to  transmit  the  aide  information,  if 
any.  Since  it  was  decided  to  extract  pitch  at  the  transmitter  and  at  the 
receiver  as  well,  this  system  has  no  side  information.  With  (26,  31)  Ham¬ 
ming  code  for  error  protection  (Hamming,  1980]  ,  a  bit  rate  of  8  kbs  is 
available  to  code  the  quantizer  levels  generated  by  ARC.  The  sampling 
rate  is  3200  Hz,  which  leaves  an  average  of  2.5  bits  per  sample.  The 
parameters  were  designed  such  that  the  average  value  of  the  source  entropy 
is  around  2.5  bits  per  sample.  However,  the  instantaneous  value  of  the 
bit  rate  has  to  be  considered  for  the  buffer  overflow  problem. 

Before  discussing  the  performance  of  the  ARC  system,  some  of  the  per¬ 
formance  indicators  are  presented.  The  single  most  widely  used  indicator 
of  speech  coder  performance  is  the  SNR  defined  by 

SNR - ,  (3.4) 

< | y(k)-y(k) |  > 


or  in  dB  by 
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SNR(dB)  =*  10  log SNR  (3.5) 

where  y(k)  and  y(k)  are  shown  in  Fig.  3.4(a)  and  (b).  <||^>  denotes  the 

averaging  operation.  It  is  well  known  that  the  SNR  calculated  as  in  Eqs. 
(3.4)  and  (3.5)  is  not  a  perfect  indicator  of  speech  quality  and  intelligi¬ 
bility.  This  is  because  perceived  quality  and  intelligibility  depend  on 
the  subjective  loudness  of  the  quantization  noise  not  just  on  the  quantiza¬ 
tion  noise  power,  which  is  the  denominator  of  Eq.  (3.4).  The  numerator  of 
Eq.  (3.4)  is  an  average  of  a  square  term.  Thus,  speech  with  high  ampli¬ 
tudes  gets  more  weighing  in  the  calculation  of  the  SNR.  Makhoul  &  Berouti 
(Feb.  1979)  and  Atal  &  Schroeder  (June  1979)  have  shown  that  due  to  the 
auditory  masking  properties  of  the  human  ear,  the  subjective  loudness  is 
determined  by  the  spectrum  of  the  quantization  noise  and  its  relationship 
to  the  input  signal  spectrum.  The  SNR  in  (3.4)  does  not  reflect  these 
considerations.  However,  the  SNR  remains  popular  since  it  is  simple  to 
compute,  and  it  can  be  useful  if  applied  prudently. 

Usually,  SNR  is  calculated  for  the  entire  utterence.  As  discussed 
earlier,  the  high  energy  segments  of  speech  get  more  weighing  in  the  SNR 
computation  than  the  low  energy  segments.  Hence,  SNR  values  indicate  the 
performance  of  coder  for  high  energy  speech,  or  voiced  speech.  The  coder 

9 

performance  for  low  energy,  or  unvoiced  speech  can  be  obtained  by  computing 
the  SNR  in  Eqs.  (3.4)  and  (3.5)  over  many  nonoverlapping  blocks  of  data 
within  an  utterence  [Noll,  1975],  The  time  variation  of  the  resulting 
"short-term"  SNR  provides  an  indication  of  how  well  the  coder  under  consid¬ 
eration  is  performing  on  the  various  blocks  of  speech  data.  Based  on  these 
short  term  SNR  values,  a  performance  measure  SNRSEG  can  be  defined  as 
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1 

SNRSEG  -  - 
k 


k 

l  SNRj(dB) 

j-1 


(3.6) 


Infrequent  very  large  values  of  SNRj  tend  to  show  up  better  in  SNRSEG  than 
in  SNR  in  (3.4)  [Jayant,  1977].  One  slight  modification  to  computation  of 
SNRSEG  in  (3.6)  would  be  to  exclude  the  SNRj  values  for  the  blocks  which 
contain  silence.  Usually,  SNR  values  for  silence  are  zero  or  negative; 
those  can  be  excluded  from  SNRSEG  computation. 

A  distortion  measure  that  is  related  to  the  SNR  is  the  SPER  given  by 
[Gibson,  1980] 


<y2(k)> 

SPER  «  - 

<ly(k)  -p(k)]2> 


(3.7) 


where  p(k)  is  the  predicted  value  in  Fig.  3.1.  The  SPER  is  motivated  by  a 
desire  to  evaluate  the  predictor  performance  only,  rather  than  to  try  to 
obtain  an  indicator  of  the  overall  system  performance  which  in  DPCM  in¬ 
cludes  both  the  quantizer  and  the  predictor  effects.  Of  course,  due  to 
the  closed  loop  nature  of  DPCM  and  the  presence  of  the  quantizer  within 
the  loop,  the  SPER  is  also  affected  by  the  quantizer. 

Another  performance  indicator  for  DPCM  is  the  SNRI  defined  by 
[McDonald,  1966;  O'Neal  S  Stroh,  1972] 


where 


SNRI 


<y2(k)> 

<[y(k)  -Pl(k)]2> 


N 

Pl(k)  -  l  a^y(k-i) . 

i-1 


(3.8) 


The  SNRI  is  simply  the  SPER  when  it  is  assumed  that  y(k)“y(k)  or  p(k)“ 
pj(k).  The  SNRI  is  interpreted  as  the  amount  by  which  linear  prediction 
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can  reduce  Che  input  signal  power  and  hence  is  sometimes  used  as  a 
measure  of  maximum  utility  of  linear  prediction  in  DPCM. 

Although  the  SNR,  SPER  and  SNRI  are  easy  to  calculate  and  hence  very 
useful  for  initial  system  evaluations,  they  are  not  absolute  indicators 
of  system  performance.  In  fact,  the  SNR  may  not  rank  the  coders  correct¬ 
ly  in  terms  of  speech  quality  and  intelligibility.  As  a  result,  it  is 
advisable  to  augment  these  previous  indicators  with  subjective  listening 
tests  and  frequency  spectrum  plots.  By  comparing  frequency  spectrum  of 
the  speech  coder  output  with  frequency  spectrum  of  the  original  speech, 
conclusions  can  be  drawn  concerning  how  well  the  coder  tracks  the  various 
formants,  reproduces  unvoiced  or  voiced  sounds  and  in  frequency  quantizer 
noise  in  prevalent.  By  combining  all  these  techniques,  coder  degradation 
can  be  analyzed. 

For  evaluating  the  performance  of  the  ARC  and  also  of  the  entire 
system,  the  following  three  phonetically  balanced  sentences  were  used. 

1)  "Cats  and  dogs  each  hate  the  other"  (Male  speaker). 

2)  "Move  the  vat  over  the  hot  fire"  (Male  speaker). 

3)  "The  pipe  began  to  rust  while  new"  (Female  speaker). 

The  input  data  were  low  pass  filtered  at  2900  Hz  (3dB)  and  sampled  at  6.4 
kHz.  The  SNR  and  the  SEGSNR  in  time  and  frequency  domain  are  given  in 
Table  3.3.  Frequency  domain  SNR  indicates  how  good  the  match  between  in¬ 
put  and  output  speech  in  frequency  domain  is.  It  is  calculated  by  taking 
OFT  of  small  segment  (20  msec)  of  input  and  output  speech  and  calculating 
SNR,  as  in  Eq.  (3.9) 


TABLE  3.3 


ARC  Performance 


Block  length = 128  samples,  Sampling  freq.*  3200  Hz 


Sent 

it 

SNR 

SEGSNR 

time 

SEGSNR 

frequency 

Entropy  H 

Male  1 

20.00  dB 

16.39  dB 

19.28  dB 

2.47 

b/sample 

Female  11 

19.83  dB 

17.57  dB 

19.74  dB 

2.59 

b/sample 

Male  6 

20.01  dB 

18.12  dB 

20.61  dB 

2.57 

b/sample 
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<)Y(n) |2>  (3.9) 

< [ | Y(n) |  -  | Y(n) | ] 2> 

where  Y(n)  and  Y(n)  are  frequency  components  of  y(k)  and  y(k) .  Fig.  3.4 
shows  the  compressed  speech  y(k),  reconstructed  compressed  speech  y(k), 
and  the  quantization  error.  It  can  be  seen  that  the  ARC  system  recon¬ 
structs  the  input  speech  very  well,  especially  in  the  voiced  speech  seg¬ 
ment.  This  is  more  clear  in  SEGSNR  plot  of  Fig.  3.5,  where  block  to 

block  SNR  is  plotted  for  the  whole  utterence  against  time.  The  plot 
shown  in  the  dashed  lines  is  for  the  SNR  in  frequency  domain.  The  dis¬ 
tortions  in  frequency  domain  can  be  seen  from  the  frequency  spectrum 

plots  of  input  and  output  speech  as  well  as  input  and  quantization  noise 

in  Figs.  3.6(a)  and  (b).  The  frequency  spectrum  of  quantization  noise 
lies  much  below  that  of  input  compressed  speech,  near  the  formant  fre¬ 
quencies  especially  for  first  and  second  formant.  At  higher  frequencies, 
however,  quantization  noise  is  perceptable  since  its  spectrum  lies  above 
the  input  signal  spectrum.  In  informal  listening  tests,  it  was  found 
that  this  granular  noise  problem  is  much  less  than  Chat  in  previously 
developed  ADPCM  or  APC  system  employing  waveform  reconstruction  tech¬ 
niques.  Noise  spectral  shaping  was  tried  by  a  number  of  authors  [Atal  & 
Schroder,  1979;  Makhoul  &  Berouti,  1979]  to  improve  the  speech  quality. 
However,  such  spectral  shaping  adds  to  the  complexity  of  the  system.  In 
the  previous  research  studies  for  developing  PARC  system  [Melsa,  et  al., 
1980] ,  it  was  noticed  that  increases  in  complexity  and  levels  of  adapta¬ 
tion  in  the  system  leads  to  poorer  performance  in  the  presence  of  channel 
errors.  Hence,  the  compromise  between  the  robustness  and  speech  quality 
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Fig-  3' 5  Block- to-block  SNR  plot  for  male  utteranc 
in  ARC  system. 
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must  be  reached  in  the  design  of  the  ARC  system.  The  Fig.  3.7  show  the 
plot  of  SNR  against  the  signal  strength.  The  desirable  performance  would 
be  a  high  value  of  SNR  for  all  signal  strengths.  Such  performance  is  un¬ 
likely  since  the  predictor  performs  very  poorly  for  unvoiced  speech  which 
is  noiselike.  However,  the  parameters  were  designed  such  that  ARC  per¬ 
formance  approaches  such  curve. 

The  quantizer  and  the  predictor  performance  for  segment  to  segment  of 
speech  is  shown  in  Figs.  3.8(a)  and  (b).  The  performance  indicator,  SNRI 
in  Eq.  (3.8),  is  also  plotted  in  Fig.  3.8(a).  It  can  be  seen  that  the 
predictor  performance,  SPER  is  very  close  to  SNRI.  That  means  that  the 
predictor  is  performing  very  well  except  in  transition  regions,  where  its 
poor  performance  is  anticipated.  Total  performance  (SNR  in  dB)  is  the 
addition  of  predictor  performance  (SPER  in  dB)  and  the  quantizer  perform¬ 
ance  (SNRQ  in  dB).  Hence,  increases  in  SPER  and  SNRQ  lead  to  an  increase 
in  SNR.  However,  such  simplification  could  be  misleading  since  perform¬ 
ance  of  the  predictor  depends  on  that  of  quantizer.  The  plots,  in  Figs. 
3.8  often  are  useful  in  evaluating  the  contribution  of  the  predictor  and 
the  quantizer  in  different  parts  of  the  speech  utterence. 

3 . 4  SUMMARY 

The  design  of  adaptive  residual  coder  for  coding  t’  r  compressed 
speech  was  presented.  An  11-level  quantizer  and  4th  order  predictor  was 
used.  The  signal  to  quantization  noise  ratio  as  high  as  20  dB  could  be 
obtained.  Various  performance  measures  to  indicate  the  speech  quality, 
were  presented.  It  was  found  that  no  single  criterion  indicates  the  true 
speech  quality.  The  study  of  combined  system  using  TDHS  and  ARC  is  pre¬ 
sented  in  the  next  chapter. 


Block-to-block  SNR,  SNRQ  and  SPER  for  female  speaker. 
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CHAPTER  4 


TIME  DOMAIN  HARMONIC  SCALING  ( TDHS )  and  ADAPTIVE 
RESIDUAL  CODER  (ARC)  SYSTEM 


4.1  INTRODUCTION 

This  chapter  describes  the  complete  system  employing  harmonic  scal¬ 
ing  and  residual  coding  operations  to  achieve  the  desired  transmission 
bit  rate.  As  discussed  earlier,  time  domain  harmonic  compression  t educes 
the  number  of  samples  to  be  transmitted,  and  the  adaptive  residual  coder 
encodes  these  samples  with  least  possible  distortion.  The  complete  sys¬ 
tem  performance  depends  not  just  on  the  TDHS  and  ARC  performance  but  also 
on  several  factors,  such  as  the  source  code,  the  err  r  protection  employ¬ 
ed  as  well  as  buffer  control  strategy.  These  and  other  topics  were  in¬ 
vestigated  and  the  results  are  presented  here. 

Section  4.2  describes  two  system  configurations;  one  with  pitch  as 
side  information  and  the  other  with  no  side  information.  The  source  code 
and  the  study  of  the  buffer  behavior  is  given  in  Sections  4.3  and  4.4  re¬ 
spectively.  The  effects  of  various  transmission  bit  error  rates  are  ex¬ 
amined  and  reported  in  Section  4.5.  The  system  performance  for  back¬ 
ground  noise  is  discussed  in  Section  4.6. 

4.2  SYSTEM  CONFIGURATION 

Fig.  4.1  shows  the  system  structure  of  TDHS-ARC  system.  For  sim¬ 
plicity  the  A/D  and  D/A  converters  and  associated  filters  are  not  shown. 
The  speech  samples,  s(k),  bandlimited  at  3.2  kHz  and  taken  at  Nyquist 
frequency,  form  the  input.  The  pitch  period,  Np,  is  extracted  from  a 
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Pitch 


block  of  speech  samples  using  the  center  clipped  3  value  autocorrelation 
method.  This  value  of  the  pitch  period  is  transferred  to  the  time  domain 
harmonic  compression  algorithm.  This  algorithm  produces  Np  samples  of  com¬ 
pressed  speech  y(k),  as  described  in  Chapter  II.  The  sampling  frequency 
now  becomes  half  of  the  original  Nyquist  frequency.  The  smoothly  varying 
compressed  speech  samples  are  coded  using  an  Adaptive  Residual  Coder.  The 
output  of  ARC  is  a  string  of  quantizer  outputs  q(k).  For  every  sample  or 
two  samples  (if  two  samples  form  the  desired  runlength),  a  4-bit  word  is 
transmitted  to  the  bit  buffer.  For  the  bit  rate  of  9.6  Kb/s  and  the  sam¬ 
pling  frequency  of  3.2  kHz,  the  average  number  of  bits  per  sample  transmit¬ 
ted  on  the  channel  is  3.  Depending  on  the  Hamming  code  chosen  for  the 
error  protection,  the  parity  bits  are  also  added  to  the  buffer. 

The  length  of  the  buffer  was  chosen  to  be  1024  bits  which  corresponds 
to  approximately  a  tenth  of  a  second  delay.  Depending  on  the  bit  rate  gen¬ 
eration,  the  buffer  may  overflow  or  underflow.  To  avoid  such  drastic  situ¬ 
ations,  buffer  content  information  is  passed  on  to  the  ARC  transmitter  to 
take  some  action  to  control  the  buffer.  This  is  explained  in  Sec.  4.3. 

Note  that  the  bits  transmitted  on  the  noisy  channel  represent  only  the 
quantizer  level  information. 

Bits  thus  transmitted  are  decoded  at  the  receiver  correcting  any  occur¬ 
rence  of  single  channel  error  in  a  Hamming  block.  The  received  quantizer 
level  q(k)  forms  the  input  to  the  ARC  receiver  whose  output  is  the  recon¬ 
structed  compressed  speech  y(k).  The  time  domain  harmonic  expansion  is  per¬ 
formed  on  these  samples.  However,  the  pitch  information  is  needed  for  the 
frequency  multiplication  operation.  There  are  two  ways  to  pass  on  the  pitch 


periods  to  the  harmonic  expansion  operation.  One  scheme  is  to  extract  the 
pitch  periods  at  the  transmitter  and  send  them  to  the  receiver  as  side  in¬ 
formation.  However,  such  scheme  is  associated  with  a  number  of  disadvan¬ 
tages.  The  biggest  disadvantage  is  the  need  for  a  blocking  structure  to 
transmit  the  quantizer  levels.  Every  2Np  samples  are  associated  with  the 
pitch  period  Np.  It  is  very  important  that  every  block  of  samples  at  the 
receiver  must  be  associated  with  the  same  pitch  period  as  that  at  the 
transmitter.  This  synchronization  can  easily  be  lost  unless  special  error 
protection  is  provided  for  pitch  information.  However,  such  error  pro¬ 
tection  as  well  as  pitch  period  transmission  would  cost  considerable  bit 
rate.  This  would  leave  fewer  bits  per  sample  for  transmitting  quantizer 
levels  and  therefore,  would  cause  more  quantization  noise. 

In  the  case  of  pitch  transmission  it  has  been  seen  that  the  matching 
of  pitch  data  and  the  block  of  quantizer  levels,  which  is  itself  pitch 
dependent,  is  very  sensitive  to  the  transmission  errors.  Besides,  for  the 
9.6  kb/s  system,  the  extra  bit  rate  required  for  error  protection  of  the 
pitch  information  can  not  be  afforded.  This  led  to  the  implementation  of 
the  other  scheme  with  pitch  extraction  at  the  receiver  (see  Fig.  4.1). 
Since  the  pitch  period  is  estimated  from  the  reconstructed  compressed 
speech  rather  than  reconstructed  original  speech,  there  are  fewer  pitch 
periods  per  unit  time  available  for  pitch  extraction.  This  problem  can  be 
handled  by  using  smaller  blocksize  and  the  searching  range.  Table  4.1 
shows  the  pitch  extracted  at  the  transmitter  and  at  the  receiver.  Since 
the  pitch  at  both  ends  is  extracted  from  a  different  type  of  signal  as 
well  as  different  number  of  samples,  comparison  should  be  made  for  the 
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TABLE  4.1 


Comparision  of  pitch  periods  extracted  at  the  trans¬ 
mitter  and  at  the  receiver. 


At  the 

Transmitter 

At  the 

Receiver 

At  the 

Transmitter 

At  the 

Receiver 

Sample 
_ # 

Pitch 

Sample 
_ # _ 

Sample 
_ # _ 

Pitch 

_ 

Sample 

_ # _ 

Pitch 

1099 

- 

1298 

27 

545 

- 

744 

27 

5333 

5532 

29 

2664 

2863 

29 

1153 

- 

1352 

82 

572 

* 

771 

55 

5391 

5590 

30 

2693 

2892 

30 

1317 

“ 

1516 

27 

627 

- 

826 

55 

5451 

- 

5650> 

30 

2723 

2922 

30 

1371 

* 

1570 

27 

682 

• 

881 

55 

5511 

• 

5710 

30 

2753 

2952 

30 

1425 

1624 

82 

737 

- 

936 

55 

5571 

- 

5770 

30 

2783 

2982 

30 

1589 

- 

1788 

55 

792 

- 

992 

28 

5631 

_ 

5830 

30 

2813 

302  2 

30 

1699 

1898 

55 

820 

- 

1019 

28 

5691 

- 

5890 

30 

2843 

• 

3042 

30 

1809 

2008 

27 

848 

- 

1047 

27 

5751 

- 

5950 

30 

2973 

- 

3072 

30 

1863 

* 

2062 

27 

875 

- 

1074 

27 

5811 

• 

6010 

30 

2903 

- 

3102 

30 

1917 

“ 

2116 

28 

902 

- 

1101 

27 

5871 

6070 

30 

2933 

- 

3132 

30 

1973 

* 

2172 

28 

929 

* 

2128 

27 

5931 

- 

6130 

30 

2963 

- 

3262 

30 

2029 

• 

2228 

28 

956 

* 

1155 

28 

5991 

• 

6190 

30 

2993 

• 

3192 

30 

2085 

- 

2284, 

28 

984 

- 

1183 

28 

6051 

- 

6250 

30 

3023 

- 

3222 

30 

1012 

” 

1211 

29 

61 1 2 

- 

6310 

30 

3053 

- 

3252 

30 

1041 

“ 

12  40 

28 

6171 

- 

6370 

30 

3083 

- 

3282 

30 

6231 

- 

6430 

30 

3123 

• 

3312 

30 

6291 

• 

6490 

30 

3143 

- 

3342 

30 

6351 

• 

6550 

30 

3173 

- 

3372 

30 

6411 

• 

6610 

30 

3203 

- 

3402 

30 

6471 

6670 

3* 

3233 

3432 

30 

69 


voiced  and  unvoiced  speech  segment.  Remember  that  the  number  of  samples  in 
any  particular  voiced  segment  of  compressed  speech  is  half  as  many  as  that 
in  a  voiced  segment  of  original  speech.  The  pitch  estimator  does  surpris¬ 
ingly  well  in  voiced  regions  of  the  utterence.  However,  in  the  transition 
regions  between  voiced  and  unvoiced  speech  pitch  estimation  at  the  receiver 
differs  significantly  from  that  at  the  transmitter.  A  similar  situation 
occurs  at  the  boundary  of  the  two  different  sounds.  The  distortion  caused 
by  the  above  situations  is  audible  in  form  of  buzziness  which  can  be  heard 
only  through  very  high  quality  headphone.  It  was  thought  that  such  distor¬ 
tion  could  be  tolerated  to  preserve  the  robustness  and  the  simplicity  of 
the  system. 

Fig.  4.2  show  the  frequency  spectrum  of  a  20  msec  segment  of  input  and 
output  speech  with  and  without  pitch  transmission.  The  speech  is  not  quan¬ 
tized  in  order  to  focus  attention  on  the  distortion  caused  by  frequency 
compression  and  expansion  operation. 

4.3  SIMULATION 

A  program  was  written  in  FORTRAN  IV  to  simulate  the  Time  Domain  Har¬ 
monic  Scaling,  Adaptive  Residual  Coding,  Coder,  Decoder  and  Buffer  control 
operations.  The  structure  of  the  simulation  is  shown  in  Fig.  4.3.  The 
transmitter  program  (TRAN'FTN)  consists  of  the  simulation  of  harmonic  com¬ 
pression,  pitch  extraction  and  the  adaptive  residual  transmitter  program. 
This  module  produces  the  quantizer  data  (QUANT'DAT),  pitch  data 
(PITCH’DAT),  and  the  output  data  (FOR006’DAT)  files.  The  output  data  file 
has  the  record  of  parameters  used,  quantizer  statistics,  predictor  quantiz¬ 
er  and  the  ARC  transmitter  performance.  The  encoder  program  reads  the 


quantizer  data  from  QUANT'DAT  file  and  codes  it  and  writes  the  code  words 
in  ENCO'DAT  file.  The  encoder  program  also  produces  the  transmitter  buf¬ 
fer  simulation  file  (TBUFF'DAT).  The  channel  simulation  program  reads 
the  code  words  from  file  ENCO'DAT  and  adds  the  desired  channel  errors  in 
the  bits  and  writes  these  corrupted  code  words  in  a  file,  called 
ERCO'DAT.  The  decoder  program  reads  this  file,  decodes  the  code  word  in¬ 
to  quantizer  levels  and  produces  receiver  buffer  simulation  file  (RBUFF' 
DAT)  and  the  decoded  quantizer  level  file  (DECO'DAT).  The  receiver  pro¬ 
gram  combines  the  ARC  receiver  operation,  pitch  extraction  and  the  time 
domain  harmonic  expansion  operation.  The  reconstructed  speech  file 
(SHAT'DAT)  and  pitch  data  file  (PITCH'DAT;2)  is  produced  for  comparisons 
with  those  at  the  transmitter. 

Various  simulation  runs  were  made  for  the  three  utterences  described 
in  an  earlier  chapter.  The  combined  system  performance  was  evaluated  by 
using  the  criteria  outlined  in  Chapter  3.  It  was  noticed  that  none  of 
the  objective  measures  described  there  indicate  the  true  quality  of  out¬ 
put  speech  particularly  in  the  case  when  pitch  is  extracted  at  the  re¬ 
ceiver.  Table  4.2  shows  the  sliding  SNR  in  the  time  and  frequency  domain 
for  three  utterances,  two  male  and  one  female  speaker.  The  negative 
value  of  SNR  does  not  indicate  the  bad  quality  of  output  speech  but  it 
represents  the  phase  difference  between  ..nput  and  output  speech.  Most  of 
the  decisions  regarding  the  speech  quality  were  based  on  the  informal 
listening  tests.  Fig.  4.4  show  that  the  time  plots  of  input  and  output 
speech  for  the  segments  of  two  utterences.  It  can  be  seen  that  the  shape 
of  the  input  speech  waveforms  is  preserved. 
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4.4  CODING 

The  ARC  transmitter  produces  quantizer  levels  from  1  to  11.  These 
source  symbols  need  to  be  coded  into  channel  alphabet  for  the  transmis¬ 
sion.  The  source  coding  must  be  such  that  it  uses  the  bit  rate  optimal¬ 
ly.  The  probabilities  of  the  quantizer  level  occupancies  are  non- 
uniform.  In  such  case,  the  bit  rate  is  optimally  utilized  by  assigning 
variable  length  code  to  the  quantizer  levels.  The  average  code  length 
for  Huffman  code,  matches  very  well  with  the  average  entropy  (Gallager, 
1969).  However,  such  simple  variable  length  source  code  tended  to  cause 
the  buffer  to  fill  up  rapidly  during  segments  of  voiced  speech.  When 
sophisticated  variable  length  code,  such  as  overfull  code  in  PARC  system 
[1980],  is  employed  the  effect  of  channel  errors  is  severe.  This  is  be¬ 
cause  a  single  bit  error  could  lead  to  a  string  of  inaccurate  data.  This 
becomes  a  serious  disadvantage  for  a  robust  system.  This  leads  to  the 
development  of  a  fixed  length  code. 

With  the  compressed  speech  sampling  rate  of  3200  Hz  and  the  bit  rate 
of  9600  bits  per  second,  the  average  number  of  bits  per  sample  are  3.  If 
simple  fixed  length  3  bit  code  words  are  used,  only  8-level  quantizer  can 
be  employed.  Fewer  quantizer  levels  cause  more  quantization  noise,  thus 
deteriorating  the  output  speech  quality.  Therefore,  11-level  quantizer 
was  used.  This  requires  4-bit  code  word  for  fixed  length  coding.  There 
are  16  possible  code  words  out  of  which  11  are  used  to  represent  the 
quantizer  levels  and  the  rest  is  used  to  represent  the  run  lengths  as 
shown  in  Table  4.3.  It  can  be  noted  there  that  a  4-bit  code  word  can 
represent  either  one  sample  or  two  samples.  Thus,  either  4  bits  or  2 
bits  per  sample  are  transmitted  as  opposed  to  the  average  of  3  bits  per 


TABLE  4.3 

Source  code  and  a  quantizer  level  statistics  for  a  typical  two  second 
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utterance. 


Source  Frequency  Probability  of 

Alphabet  Codewords  of  Occupancy  Occupancy 


1 

0 

0 

0 

0 

0 

516 

0.1215 

2 

0 

0 

0 

1 

1 

366 

0.0862 

3 

0 

1 

0 

0 

4 

351 

0.0826 

4 

0 

0 

1 

0 

2 

375 

0.0883 

5 

0 

1 

0 

1 

5 

265 

0.0624 

6 

0 

0 

1 

1 

3 

78 

0.0184 

7 

0 

1 

1 

0 

6 

79 

0.0186 

8 

1 

0 

1 

1 

11 

5 

0.0012 

9 

0 

1 

1 

1 

7 

26 

0.0061 

10 

1 

1 

1 

1 

15 

6 

0 . 0000 

11 

1 

1 

1 

0 

14 

7 

0.0016 

1, 

1 

1 

0 

0 

0 

8 

920 

0.2166 

2, 

1 

1 

0 

1 

0 

10 

402 

0.0946 

2, 

2 

1 

1 

0 

1 

13 

248 

0.0584 

3, 

1 

1 

0 

0 

1 

9 

432 

0.1017 

3, 

3 

1 

1 

0 

0 

12 

178 

0.0419 

Probability  of  runlength  occurring  -  0.5132 
Probability  of  no  runlength  -  0.4868 
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sample.  If  the  probability  of  occurrence  of  run  length  is  large  enough, 
then  the  average  number  of  bits  per  sample  will  be  3. 

Let  the  probability  of  occurrence  of  run  length  be  p.  Hence  (1-p) 
is  the  probability  of  no  run  length  occurring. 

Hence 

P  *  2  1 

Average  number  of  samples/bit  ■  ■*— - —  +  (l-p)-^ 

or  »  i  (1+p) 

It  is  required  that  the  average  number  of  bits  per  sample  should  not  ex¬ 
ceed  three  or  the  average  number  of  samples  per  bit  should  be  more  than 
one  third. 

Hence 

£  (1+p)  >  1/3 

or 

P  >  1/3 

or  the  probability  of  runlength  should  be  greater  than  0.33 

If  the  run  lengths  occur  at  least  one  third  of  the  time,  the  average 
bit  rate  is  maintained.  The  quantizer  and  the  predictor  parameters  were 
designed  such  that  lower  level  quantizer  level  occupancy  is  high  enough 
to  maintain  the  probability  of  run  lengths  around  0.33.  Table  4.3  gives 
the  statistics  of  the  quantizer  levels,  run  lengths  and  the  code  words. 
All  the  code  words  are  4  bits  long  and  this  prevents  the  propagation  of 
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transmission  error.  However,  samples  can  be  added  or  deleted  because  of 
such  error.  Table  4.4  lists  all  the  possible  transitions  for  quantizer 
levels  if  the  single  bit  error  occurs.  The  third  column  shows  the  proba¬ 
bility  of  the  transition  from  run  length  to  non-run  length  and  vice- 
versa.  To  find  out  the  average  addition  or  deletion  of  samples  at  the 
receiver  due  to  channel  error,  consider  probabilities  of  the  quantizer 
levels  and  that  of  the  transitions  given  in  Table  4.4. 

Let  e  be  the  bit  error  rate  and  p£  be  the  probability  of  ith 
quantizer  level.  Similarly,  the  run  length  probabilities  are  p^  ^ ,  P2,l» 
P2,2*  P3,l  and  P3,3* 

The  samples  added  at  the  receiver  per  four  bits  transmitted  * 

1  ,  P8  PI 1 

e  ~  {pi+P2+P3+P4+P5+P8+PlO+Pll*  +  —  +  — — 


1  r  .  pl,l  p2,l 

~  j  {pl,l+p2,l+p2,2+P3,l+P3,3i  + 


or 


where 


(1-p)  _  P6  p7  _  p9  P8  ^  Pi  1  _  p  Pi , 1  P2,l 

4  ~  4~  ~  4  ~  2  4~  4~ 


1-p  p  1 

-  -  +  -  {p8+Pll-P2,l-P6“P7“P9+Pl,l} 


p  -  probability  of  run  length 
(1-p)  -  probability  of  no  run  length 
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TABLE  4.4 


Quantizer 

Level 

Possible  quantizer 

Level  after  error 

Probability 

1 

[2, 4, 3, (1,1)} 

1/4 

2 

1,6, 5, (2,1) 

1/4 

3 

5, 7,1, (3,1) 

1/4 

4 

6, 1,7, (1,2) 

1/4 

5 

3, 9, 2, (1,3) 

1/4 

6 

4, 2, 9, 8 

0 

7 

9,3,4,11 

0 

8 

(1,2), (2, 1)10, 6 

1/2 

9 

7,5,6,10 

0 

10 

11, (1,3), 8, 9 

1/4 

11 

10, (3,1), (1,2), 7 

1/2 

1,1 

(2, 1),(1, 2), (3,1), 7 

1/4 

2,1 

8, (1,1), 11, 4 

3/4 

2,2 

(3,1), 10, (2,1), 5 

1/2 

3,1 

2, 8, (1,3), (1,1) 

1/2 

3,3 

3, 11, (1,1), (1,3) 

1/2 

I 

I 

I 
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Consider  the  specific  example,  say  the  sentence  1,  this  number  conies 
to  0.0006  samples  for  the  BER  of  IX  or  approximately  1.25  samples  per 
second. 

The  bit  string  in  this  system  represents  only  the  quantizer  levels. 
If  this  information  is  transmitted  over  noisy  channel  with  BER  as  high  as 
0.1%  without  error  protection,  the  output  speech  quality  is  slightly  de¬ 
graded.  The  higher  BERs  require  error  protection.  (57,63)  and  (26,  31) 
single-error-correction  Hamming  codes  were  tried.  The  output  speech 
quality  was  found  to  be  much  better  for  (26,31)  single  error  correcting 
Hamming  code.  However,  speech  quality  for  no  transmission  error  is  sac¬ 
rificed. 

4.5  BUFFER  CONTROL 

It  was  noted  in  previous  sections  that  the  code  employed  in  the 
system  generates  a  variable  number  of  bits  per  sample.  This  causes  the 
buffer  content  to  fluctuate  considerably.  Since  the  buffer  is  of  fixed 
bit  length,  (1024  bits  in  this  case)  it  is  important  to  monitor  the  buf¬ 
fer  behavior  and  control  it  if  the  buffer  overflows  or  underflows.  For 
example,  in  the  unvoiced  and  silence  segment  of  speech,  more  and  more  run 
lengths  of  quantizer  levels  are  formed,  thus  generating  only  2  bits  per 
sample  as  against  average  rate  of  3  bits  per  sample.  If  such  situation 
continues  for  long  time,  the  buffer  may  eventually  run  out  of  bits.  The 
buffer  overflow  situation  may  occur  for  the  voiced  segment  of  speech 
where  4  bits  per  sample  are  generated  as  against  3  bits  per  sample  are 


removed  from  the  buffer. 
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The  simulation  of  the  bit  buffer  at  the  transmitter  (or  sample  buf¬ 
fer  at  the  receiver)  is  carried  out  by  keeping  count  of  the  number  of  net 
bits  (or  samples)  added  to  the  buffer.  The  bit  rate  required  for  error 
protection  is  taken  into  account  by  adding  (n-m)  parity  check  bits  to  the 
transmitter  buffer  for  m  information  bits.  The  sample  buffer  at  the  re¬ 
ceiver  is  not  affected  by  the  error  protection  bits.  The  following  equa¬ 
tions  describe  the  buffer  simulations  implementation 

b(k)  ■  b(k-l)  +  r(k)  -  r  (4.1) 

where 

b(k)  *■  Current  buffer  content 
k  “  Time  instant 
r(k)  ■  Instantaneous  bit  rate 

*  4  bits/sample  or  4  bits/2  samples 

r  *  Average  bit  rate 

m  x  3 

■  -  for  (m,  n)  Hamming  code 

n 

For  no  run  length 

b(k)  ■  b(k-l)  +  4  -  r  (4.2) 

For  run  length 

b(k)  -  b(k-l)  +4-27  (4.3) 

Similarly,  the  receiver  sample  buffer  simulation  is  expressed  by 

s(k)  -  s(k-l)  +  0.25  -  0.333  (4.4) 

or 


s(k)  -  s(k-l)  +  0.50  -  0.333 


(4.5) 
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Equation  (4.4)  is  for  "no  run  length"  case  and  Eq.  (4.5)  expresses  "run 
length"  situation.  On  an  average,  one  sample  per  3  bits  or  1/3  sample 
per  bit  is  added  to  the  sample  buffer.  All  the  code  words  are  four  bit 
long.  The  number  of  samples  added  to  the  buffer  is  either  one  or  two 
i.e.  0.25  or  0.5  samples  per  bit. 

As  mentioned  earlier,  the  transmission  buffer  tends  to  underflow  when 
large  number  of  run  lengths  of  quantizer  levels  is  generated.  The  buffer 
underflow  can  be  prevented  by  switching  the  code  when  small  number  of  bits 
are  left  in  it.  Such  switching  can  be  accomplished  by  not  employing  a  run 
length  when  the  buffer  contents  drop  below  a  certain  threshold.  In  this 
case,  the  buffer  content  is  incremented  by  one  bit  for  every  quantizer 
level.  The  buffer  underflow  control  can  be  seen  in  Fig.  4.5.  It  shows 
the  transmitter  and  receiver  buffer  behavior  for  an  utterence  spoken  by  a 
male  speaker.  When  the  bit  count  reaches  one  hundred,  switching  of  code 
occurs  and  the  bit  count  remains  near  or  above  the  threshold. 

The  buffer  overflow  situation  usually  happens  in  the  voiced  segment 
of  speech.  The  probability  of  run  lengths  is  small  when  outer  quantizer 
levels  are  frequently  generated.  In  such  case,  there  is  a  net  addition  of 
a  bit  for  every  sample,  eventually  causing  overflow.  The  buffer  overflow 
should  be  avoided  since  it  could  cause  a  loss  of  information.  To  prevent 
such  situation,  a  method  had  to  be  found  to  sharply  limit  the  bit  genera¬ 
tion  rate  occasionally  which  did  not  cause  an  unreasonable  amount  of  dis¬ 
tortion.  One  such  method  could  be,  changing  the  quantizer  level  thresh¬ 
olds  gradually  as  the  buffer  fills  beyond  certain  thresholds.  In  the  sim¬ 
ulations,  three  buffer  thresholds  and  three  scalars  to  change  the 
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quantizer  level  thresholds,  were  specified.  The  quantizer  changes  as  the 
buffer  crosses  these  thresholds,  is  shown  in  Fig.  4.6.  This,  of  course, 
introduces  more  quantization  noise  and  hence  distortion.  The  performance 
of  the  ARC  system  with  buffer  control  is  outlined  in  Table  4.5.  It  can 
be  seen  that  the  distortion  introduced  is  gradual.  The  transmitter  buf¬ 
fer  behavior  with  buffer  control,  can  be  seen  in  Fig.  4.7.  The  number  of 
bits  in  the  buffer  is  limited  to  1024.  The  clipping  of  the  buffer  at 
this  value  indicates  buffer  overflow  situation.  With  the  buffer  control 
strategy  described  above,  more  lower  quantizer  levels  are  generated  thus 
increasing  the  run  length  probability.  The  Table  4.6  shows  the  statis¬ 
tics  of  the  quantizer  with  and  without  the  buffer  control.  The  frequency 
of  occupancies  of  lower  levels  is,  indeed,  increased  and  hence  the  proba¬ 
bility  of  the  run  lengths.  The  advantage  of  this  scheme  is  its  simplic¬ 
ity  since  the  receiver  need  not  know  the  quantizer  thresholds. 

4.6  TRANSMISSION  ERRORS 

Many  toll  quality  speech  links  maintain  bit-error  rates  (BER)  which 
are  too  small  (less  than  10"^)  to  affect  th-*  quantizer  and  hence  the 
coder  performance.  However,  a  BER  of  one  tenth  of  a  percent  is  not  un¬ 
common  and  for  bad  channels  this  rate  could  be  as  high  as  one  percent. 

It  is  important  to  determine  the  extent  of  degradation  and  if  possible 
how  to  minimize  it. 

The  FORTRAN  simulation  program  was  written  to  study  the  effects  of 
transmission  errors.  The  bit  manipulations  required  for  the  exact  simu¬ 
lation  of  real  situation  is  rather  difficult  to  accomplish  easily  in 
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FORTRAN.  However,  Che  following  seeps  in  simulation  procedure  do  repre¬ 
sent  the  real  time  situation  very  closely. 

1.  The  encoder  program  reads  a  block  of  256  quantizer  levels, 
converts  them  into  code  words  and  writes  them  into  a  file. 

This  file  length  is  always  shorter  than  the  quantizer  level 
file  because  of  run  lengths. 

2.  The  channel  simulation  program  reads  the  code  words  and  asks 
the  BER  during  run  time.  Depending  on  the  choice  of  n  in 
(m,  n)  Hamming  code,  the  block  of  code  words  is  chosen.  For 
example,  (57,63)  single  error-correcting  Hamming  code,  the 
block  of  16  code  words  is  considered.  The  errors  are  *dded 
in  the  bit  stream  according  to  the  bit  error  rate  specified. 

The  single  error  is  corrected,  the  double  or  even  number  of 
errors  are  passed  uncorrected  and  for  three  or  more  odd  num¬ 
bers  of  errors  additional  error  is  introduced.  The  single 
error  corrected  code  words  are  written  in  a  separate  file. 

The  decoder  program  reads  this  file  and  produces  a  quantizer 
level  file  which  forms  the  input  to  the  receiver  program. 

The  simulation  described  was  carried  out  with  the  parameters  de¬ 
signed  to  minimize  the  effect  of  transmission  errors.  [Melsa,  et  al. , 
1980] .  Because  of  addition  and  deletion  of  samples  due  to  the  channel 
errors  the  system  performance  cannot  be  reliably  evaluated  using  objec¬ 
tive  measure  criteria.  Informal  listening  test  was  used  to  measure  the 
performance.  The  output  speech  is  distorted  due  to  two  reasons.  First, 
due  to  the  error  in  quantizer  levels  and  second,  due  to  wrong  pitch 
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extraction.  As  mentioned  in  an  earlier  chapter,  the  pitch  is  extracted 
from  the  reconstructed  compressed  speech,  y(k).  If  y(k)  is  changed  signif¬ 
icantly  due  to  the  channel  errors,  the  wrong  pitch  might  be  extracted. 

Table  4.8  shows  the  pitch  extracted  at  the  transmitter  and  at  the  receiver 
with  and  without  channel  errors.  It  also  lists  the  pitch  periods  extracted 
after  the  error  correction.  The  effect  of  1%  channel  errors  on  the  pitch 
extraction  is  not  significant.  Besides,  the  distortion  caused  due  to  the 
use  of  a  wrong  pitch  period  is  masked  by  that  due  to  the  channel  error. 
After  error  correction  at  the  receiver,  pitch  is  extracted  generally  cor¬ 
rectly  and  thus,  pitch  extraction  at  the  receiver  does  not  cause  any  severe 
degradation  in  the  presence  of  channel  noise. 

In  the  previous  section,  the  effects  of  [26,31]  and  [57,63]  single 
error  correcting  Hamming  code  on  the  buffer  behavior  were  discussed.  The 
allocation  of  significant  bit  rate  for  error  protection  penalizes  the  error 
free  performance  because  of  the  repeated  use  of  the  buffer  overflow  control 
strategy.  However,  the  system  performance  in  the  presence  of  channel  noise 
is  greatly  improved.  The  extent  of  this  improvement  can  be  determined  by 
finding  BER  after  using  single  error  correcting  Hamming  code. 

Let  [m,n]  be  the  single  error  correcting  Hamming  code  and  e  be  the  BER. 
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If  a  single  error  in  che  frames  is  corrected,  the  residual  error 
becomes  [l-(l-e)n-ne(l-e)n“l] .  This  error  could  be  equivalent  to  the 
BER,  6,  without  any  error  correction.  This  is  expressed  in  the  following 
equations. 

1  -  (l-6)n  -  1  -  (l-e)n_1  -  n  e  (l-e)"’1  (4.5) 

let  n  *  31  and  e  ■  0.01  or  BER  *  IX 

(1  -  <1-6)31]  *  u  -  (0.99)31]  -  31(.01)  ( .99)30 
-  0.2676  -  .2293  -  .03839 
(1-6)31  .  0.9616 
31  ln( 1-6 )  -  In  0.9616 
ln( 1-6 )  -  -.00126 

1-6  -  0.9987  or  6  -  0.0012 

i.e.  1Z  BER  with  single  error  correction  using  [26,31]  Hamming  code  is 
equivalent  to  0.12Z  BER  without  any  correction.  The  similar  calculations 
and  the  simulation  results,  for  different  BERs  and  Hansning  codes,  are 
outlined  in  Table  4.9.  It  can  be  seen  that  [26,31]  single  error  correct¬ 
ing  Hamming  code  is  quite  effective  against  transmission  errors.  While 
studying  the  Table  4.9  it  should  be  kept  in  mind  that  the  frame  size  used 
in  the  simulation  studies,  was  64  and  32  bits  instead  of  the  actual  63 
and  31  bits  respectively.  Also,  it  should  be  noted  in  the  simulation 
studies  that  additional  errors  were  introduced  if  3  or  more  odd  number  of 
errors  were  detected  in  a  frame. 

It  was  mentioned  earlier  that  transmission  errors  cause  addition  or 
deletion  of  samples.  This  happens  because  the  four-bit  code  word  repre¬ 
sent  either  one  or  two  samples.  If  the  channel  error  changes  the  code 
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TABLE  4.9 


Sentence  1 

Male  speaker,  2  second  utterance,  sampling  frequency  *  3200  Hz 


BER 

Hamming 

Code 

Theoretical 

BER 

after  single 
error 

correction 

Simulated 

BER 

after  single 
error 

correction 

Theoretical 

BER 

after  double 

error 

correction 

Simulated 

BER 

after  double 

error 

correction 

0.1Z 

[57,63] 

0.0029Z 

0.074Z 

0.097Z 

0.092Z 

[26,31] 

0.0014Z 

0.074Z 

0.0985Z 

0.092Z 

1.0Z 

[57,63] 

0.26Z 

0.502Z 

0.714Z 

0.553Z 

[26,31] 

0.137Z 

0.307Z 

0.85Z 

0.713Z 

5.0Z 

[57,63] 

2.77Z 

4.8Z 

2.15Z 

3.6Z 

[26,31] 

1.98Z 

4.16Z 

2.42Z 

3.31Z 

I 
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word  to  represent  two  samples  instead  of  one,  as  it  should  be,  there  is 
an  addition  of  samples  at  the  receiver.  The  deletion  of  sample  occurs  if 
the  reverse  situation  happens.  In  the  earlier  discussion  of  buffer  con¬ 
trol,  it  was  pointed  out  that  the  receiver  buffer  tends  to  overflow  dur¬ 
ing  silence  and  underflows  during  voiced  region.  A  desirable  effect  of  a 
channel  error  would  be  the  addition  of  samples.  This  might  cause  dele¬ 
tion  of  silence  or  prevent  underflowing  of  the  buffer  during  voiced  seg¬ 
ment.  For  the  two  second  utterence,  the  effect  of  channel  errors  on  the 
receiver  buffer  behavior  is  not  so  pronounced.  Therefore,  the  BER  of  5Z 
was  considered.  The  receiver  buffer  plot  for  five  percent  bit  error  rate 
is  shown  in  Fig.  4.8.  It  can  be  clearly  seen  that  the  net  effect  of 
channel  errors  is  to  add  samples  to  the  buffer  for  the  code  word  assign¬ 
ment  in  Table  4.3. 

4.7  BACKGROUND  NOISE 

The  system  performance  was  evaluated  for  the  male  and  female  utter¬ 
ances,  spoken  in  quiet  rooms  without  any  background  noise.  So,  the  input 
to  the  coder  is  undistorted  input.  However,  this  is  an  unlikely  situa¬ 
tion  since  there  will  certainly  be  background  interference  such  as  other 
speakers,  typewriter  noises  and  the  like,  whenever  a  speaker  is  using  the 
"digital  'telephone".  It  was  thought  that  the  periodic  background  noise 
would  be  the  worst  kind  of  noise  for  TDHS-ARC  system  since  the  periodic¬ 
ity  of  the  input  signal  is  used  for  harmonic  scaling  operations. 

The  study  of  background  noise  is  very  simple  after  the  algorithm  has 
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been  implemented  in  real  time.  It  is  just  a  matter  of  talking  into  a 
handset  with  noise  in  the  background.  The  output  could  be  heard  through 
headphones.  However,  in  the  FORTRAN  simulation,  the  task  is  not  so 
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straight forward.  There  is  a  need  for  a  digital  speech  file  with  back¬ 
ground  noise.  Multispeaker  files  were  created  by  adding  two  digital 
speech  files  with  appropriate  weight. 

In  the  simulation,  the  multispeaker  file  was  generated  by  adding 
female  speech  to  male  speech  as  in  Eq.  4.6 

Composite  "  all  +  k  S1  (4.6) 

where  k  takes  values  from  0  to  1  thus  having  varying  degree  of  background 
noise.  It  was  noticed  that  pitch  extraction  loop  picks  the  pitch  for  the 
dominant  speaker  (see  Table  4.10)  at  each  short-time  interval.  Due  to 
the  masking  properties  of  the  ear,  the  harmonic  distortion  in  the  non¬ 
dominant  speech  is  not  heard.  The  ARC  and  the  total  system  performance 
for  the  composite  speaker  is  shown  in  Table  4.11.  As  long  as  the  pitch 
tracking  algorithm  does  not  break  down,  this  system  should  perform  very 
well  for  background  noise. 

4.8  SUMMARY 

The  system  presented  in  this  chapter  produces  a  high  quality  speech 
at  the  transmission  rate  of  9.6  kb/s.  The  speech  quality,  however,  de¬ 
pends  on  the  bit  rate  used  for  error  protection.  For  example,  the  use  of 
[26,31]  single  error  correcting  Hamming  code  results  in  an  excellent  sys¬ 
tem  performance  for  the  bit  error  rate  of  It.  However,  noise  free  per¬ 
formance  is  degraded  because  of  coarse  quantization  and/or  buffer  control 
operations.  The  quantizer  level  runlength  is  effectively  used  to  employ 
fix  codeword  size.  The  switching  of  code  and  changing  the  quantizer 
thresholds  was  found  to  be  the  effective  strategy  to  control  buffer  un¬ 
derflow  and  overflow  respectively.  The  high  bit  error  rates  do  affect 
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TABLE  4.11 

Performance  of  ARC  and  TDHS-ARC  system  for  multispeaker  files 


Sentence 

ARC 

TDHS  -  ARC 

# 

SNR  (dB) 

SPER  (dB) 

SNRQ(dB) 

SEGSNR  (dB) 
in  Freq.  domain 

Sll 

15.29 

5.68 

9.61 

13.02 

S11+.25S1 

16.82 

6.09 

10.73 

11.91 

Su+.5Si 

17.08 

6.11 

10.97 

11.51 

S1 

13.98 

4.60 

9.38 

8.32 
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the  pitch  extraction  at  the  receiver.  However,  the  distortion  caused  by 
improper  pitch  period  was  found  to  be  masked  by  the  distortion  due  to 
transmission  error  effects.  The  system  also  behaves  very  well  for 
simultaneous  speakers. 


CHAPTER  5 


THE  16  KB/S  SYSTEM 


5 . 1  INTRODUCTION 

In  the  earlier  chapters,  the  TDHS-ARC  system  for  the  bit  rate  of  9.6 
kb/s  was  presented.  The  same  system  design  can  be  extended  to  a  bit  rate 
of  16  kb/s  with  a  few  modifications.  The  speech  quality  and  the  robustness 
of  the  16  kb/s  system  are  improved  with  the  availability  of  more  bits  per 
sample  for  coding.  The  speech  quality  improvement  is  achieved  by  reducing 
the  quant iziat ion  noise,  which  can  be  done  by  increasing  the  number  of 
quanitzer  levels.  The  amount  of  error  protection  available  determines  the 
robustness  of  the  system.  These  two  basic  issues  concerning  16  kb/s  system 
are  discussed  in  detail  in  this  chapter.  The  other  aspects  of  the  system 
design  such  as  the  system  configuration,  the  buffer  control  strategy  and 
the  type  of  source  code,  remain  the  same. 

Section  5.2  describes  the  source  and  the  channel  code  used  in  this 
system.  The  choice  of  parameters  in  the  ARC  design  and  the  buffer  behavi¬ 
our  is  discussed  in  Section  5.3.  The  effect  of  transmission  errors  is  al¬ 
so  discussed  in  this  section.  The  results  are  summarized  in  Section  5.4. 

5.2  SOURCE  AND  CHANNEL  CODING 

With  the  bit  rate  of  16  kb/s  and  the  sampling  frequency  of  3.2  kHz  of 
the  compressed  speech,  an  average  of  5  bits  per  sample  are  available  for 
coding.  A  31-level  quantizer  and  5-bit  fixed  length  codewords  were  used. 
The  system  performance  was  found  to  be  excellent  and  no  distortions  could 
be  heard  in  the  informal  listening  tests.  However,  the  above  scheme 
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of  coding  is  possible  only  if  no  error  protection  is  desired.  For  the 
noisy  channel,  a  21-level  quantizer  with  the  5-bit  fixed  wordsize  variable 
input  code  was  designed.  Out  of  the  possible  32  codewords,  21  codewords 
are  used  to  represent  the  quantizer  levels  and  the  remaining  are  used  to 
represent  the  runlengths.  The  lower  quantizer  levels  (for  example:  1,2, 
3,4)  form  the  runlengths.  The  code  designed  to  represent  the  quantizer 
levels  and  the  runlengths  is  shown  in  Table  5.1.  The  Ramming  distance  be¬ 
tween  the  codewords  representing  adjacent  quantizer  levels  is  kept  to  be 
minimum  possible.  With  the  choice  of  codewords  as  shown  in  Table  5.1,  the 
buffer  empties  by  2.5  bits  per  sample,  everytime  the  runlength  occurs 
while  there  is  no  net  addition  to  the  buffer  otherwise.  If  the  error  pro¬ 
tection  is  used,  the  check  bits  are  added  to  the  buffer  every  certain  num¬ 
ber  of  the  information  bits.  This  may  cause  a  buffer  overflow  depending 
on  the  amount  of  error  protection  used.  Such  situations  may  be  avoided  by 
adjusting  the  probability  of  occurrence  of  runlengths. 

If  the  probability  of  occurrence  of  runlength  is  p,  then  (1-p)  be¬ 
comes  the  probability  of  no  runlength  occurring.  Let  [m,n]  be  the  Hamming 
code  used,  where  m  are  the  information  bits  and  (n-m)  are  the  check  bits. 
The  bit  rate  available  to  code  the  quantizer  levels  becomes  16‘m/n  kb/s 
and  the  average  number  of  bits  per  sample  is  5*m/n.  With  the  above  proba¬ 
bilities, 


or 


the  average  number  of  samples/bits 


n  +  <i-p> 

5  5 

■|<l+p) 


It  is  required  that  the  average  bits  per  sample  be  equal  or  less  than 
5m/n  or  the  average  number  of  samples  per  bit  be  equal  or  more  than  n/5m. 


102 


TABLE  5.1 


The  Codeword  Assignment 


Source  Alphabet 

Codewords 

1 

00000 

2 

10000 

3 

00001 

4 

10001 

5 

00011 

6 

10011 

7 

00010 

8 

10010 

9 

00110 

10 

10110 

11 

00111 

12 

10111 

13 

00101 

14 

10101 

15 

00100 

16 

10100 

17 

01100 

18 

11100 

19 

01101 

20 

11101 

21 

01111 

1,1 

01000 

1,2 

11000 

1.3 

01001 

2,1 

11001 

2,2 

11D11 

2,3 

11010 

3,1 

01011 

3,2 

01010 

3,3 

oino 

4,4 

11110 

5,5 

mil 

Hence 


103 


or 


(5.1) 


P 


(5.2) 


For  the  Hannning  code  of  [11,15]  and  [26,31]  this  probability  should  be  equal 
to  or  greater  than  0.36  and  0.19  respectively.  That  means,  36Z  or  I9Z  of 
the  time  runlengths  should  occur  to  maintain  the  average  number  of  bits  per 
sample.  The  quantizer  and  the  parameters  in  the  ARC  system  were  designed  to 
satisfy  Eq.  (5.2).  Table  5.2  shows  the  statistics  and  the  parameters  used 
for  two  different  Hamming  codes.  The  block-to-block  SNR  for  ARC  is  plotted 
for  both  the  parameter  sets  in  Fig.  5.1.  It  can  be  observed  from  this  fig- 
ure  that  the  heavy  error  protection  leads  to  more  quantization  noise  since 
less  bit  rate  is  available  for  the  quantizer  level  coding. 

It  was  discussed  for  the  9.6  kb/s  system  that  the  compromise  between 
the  system  robustness  and  the  speech  quality  must  be  made.  It  is  desired  in 
this  project  to  have  a  good  system  performance  for  the  bit-error-rate  (BER) 
of  1Z.  If  [11,15]  single  error  correcting  Hamming  code  is  used  and  the 
computations  as  in  Section  4.6  are  carried  out,  1Z  BER  with  a  single  error 
correlation  becomes  equivalent  to  0.06Z  BER,  with  no  error  correction.  For 
[26,31]  Hamming  code  1Z  BER  becomes  equivalent  to  0.12Z  BER  with  no  error 
correction.  In  the  simulation  studies  it  was  found  that  the  effect  of  BER 
of  0.06Z  was  hardly  noticeable. 


5.3  BUFFER  BEHAVIOUR 

There  are  two  buffers  in  the  system.  The  transmitter  buffer  which  is  a 
bit  buffer  and  the  receiver  sample  buffer.  The  simulation  of  these  buffers 
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la  carried  out  Che  similar  way  as  In  9.6-kb  system.  For  no  runlengch 
case,  5  blcs  are  added  Co  Che  buffer  while  Che  same  number  of  blcs  are 
taken  out  of  Che  buffer.  Thus,  chere  Is  no  net  addition  Co  Che  buffer. 

For  Che  runlengch  (of  Cwo  samples)  case,  5  bits  are  added  as  against  10 
bits  are  taken  out  of  the  buffer.  In  addition  Co  chls  (n-m)  checkblts  are 
added  Co  Che  buffer  for  every  m  Information  bits. 

The  decoder  at  Che  receiver  decodes  Che  frame  of  bits,  makes  Che  cor¬ 
rections  If  any,  and  puts  Che  samples  In  Che  receiver  buffer.  For  16-kb 
system,  Che  average  number  of  samples  per  bit  taken  out  of  Che  receiver 
buffer  la  1/5.  If  [11,15]  Hamming  code  was  used,  for  every  frame  of  15 
blcs  3  samples  are  taken  out  and  Che  number  of  samples  varying  from  2.2  Co 

4.4  are  added  Co  the  buffer.  To  work  with  the  integer  number  of  samples, 
Che  receiver  buffer  calculations  are  done  every  75  bits  or  every  4.67  msec 
for  [11,15]  code  and  every  155  bits  (*9.67  msec)  for  [26,31]  code.  The 
transmitter  and  che  receiver  buffer  ploCs  are  shown  in  Fig.  5.2(a)  and  (b) 
for  both  the  codes.  For  the  same  set  of  parameters  the  transmitter  buffer 
fills  faster  for  [11,15]  code  chan  for  [26,31]  code  thus  requiring  buffer 
control  more  frequently. 

5.4  SUMMARY 

A  speech  coder  was  developed  for  transmission  of  speech  at  Che  bit 
rate  of  16  kb/s  using  time  domain  harmonic  scaling  and  adaptive  residual 
coding.  The  system  configuration  is  the  same  as  9.6  kb/s  system.  It  was 
decided  to  avoid  pitch  transmission  since  it  retains  the  simplicity  of  the 
system.  Besides,  the  distortion  introduced  by  extracting  Che  pitch  in¬ 
formation  at  the  receiver,  is  masked  by  the  quantization  noise.  The  bit 
rate,  which  would  have  been  wasted  in  transmitting  and  error  protecting 
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Transmitter  and  receiver  buffer  plots  for  a  male  speaker  using  [11,15]  Hamming  Code 
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the  pitch  information,  could  be  employed  to  make  the  system  more  robust  to 
channel  noise.  Thus,  the  use  of  [11,15]  Hamming  code  was  possible.  A 
segmental  SNR  of  more  than  20  dB  was  achieved  using  21-level  quantizer. 

The  variable  input  code  of  5-bit  fixed  wordsize  was  found  to  be  useful 
since  the  transmission  error  effects  do  not  get  magnified. 

An  attempt  is  made  to  keep  the  structure  of  the  16  kb/s  system  as 
simple  as  9.6  kb/s  system.  No  side  information  is  transmitted,  hence 
there  is  no  blocking  of  data  or  block  synchronization  problem.  The  fre¬ 
quency  compression  and  expansion  operations  by  a  factor  of  two  are  carried 
out  using  a  triangular  window.  The  hardware  implementation  of  the  system 
should  be  the  same  as  a  9.6  kb/a  system  except  for  the  quantizer  design 
and  the  codeword  size. 


CHAPTER  6 


SUMMARY 

A  speech  coder  was  developed  using  a  new  approach  of  combining  fre¬ 
quency  scaling  in  time  domain  and  adaptive  residual  coding.  The  computer 
simulations  of  the  system  were  kept  very  close  to  real  situation.  The 
speech  quality  was  found  to  be  excellent.  The  system's  overall  perform¬ 
ance  could  not  be  measured  satisfactorily  using  existing  objective  per¬ 
formance  measure  criterion.  In  such  cases,  the  output  speech  quality  was 
evaluated  by  using  informal  listening  tests.  However,  there  are  various 
objective  measure  criteria  as  listed  in  Chapter  3  to  evaluate  the  perform¬ 
ance  of  DPCM  coders.  Various  block-to-block  SNR  plots  indicate  that  the 
adaptive  quantizer  and  the  predictor  perform  very  well.  Such  excellent 
performance  of  the  ARC  system  is  possible  because  of  smooth  varying  fre¬ 
quency  compressed  input  signal. 

This  system  is  designed  for  the  bit  error  rate  of  IX.  However,  it 
can  easily  withstand  bit  error  rates  higher  than  5Z.  In  such  a  severe 
channel  condition,  the  output  speech  is  considerably  distorted  but  the  al¬ 
gorithm  does  not  diverge.  In  the  simulation  studies,  it  was  noticed  that 
a  compromise  must  be  reached  between  robustness  and  transmission  error 
free  coder  performance.  With  the  telephone  modem  assuring  bit  error  rates 
less  than  10~3,  output  speech  quality  is  very  close  to  toll  quality.  The 
effect  of  channel  error  is  reduced,  particularly  in  this  system,  because 
of  averaging  operations  performed  at  the  receiver  to  get  expanded  speech. 
The  transmission  error  effects  do  not  get  magnified  because  of  the  fixed 
word  size  of  the  codeword.  In  many  coders  for  bit  rate  of  9.6  kb/s,  en¬ 
tropy  coding  is  used.  Variable  length  codewords,  such  as  Huffman  code. 
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Ill 


Chough  optimally  utilise  the  available  bit  rate,  are  very  inefficient  if 
the  channel  error  occurs.  One  bit  error  could  cause  a  string  of  wrong 
code  words  to  be  decoded. 

Another  coder  robustness  indicator  is  its  performance  in  background 
noise.  In  real  situation,  the  calker  is  often  Calking  in  the  presence  of 
typewriter  noise  or  background  conversation.  This  coder  performs  very 
well  for  the  multispeaker  case.  The  waveform  coders  generally  have  this 
advantage  over  frequency  domain  speakers.  It  was  observed  in  our  simula¬ 
tion  studies  that  various  tones  pass  through  this  system  with  slight  or  no 
distortion.  This  excellent  performance  for  tones  is  due  to  the  fact  that 
harmonic  scaling  operations  do  not  introduce  any  distortion  for  perfectly 
periodic  signals. 


Z  ~*C  :■  " 


window 


[1*  cost  2*S)] 

Itlf  £0 

AT0  sin(nfT0) 
w(f)-  2  (rr fT  TTT- 

*•  iO 

window 

otherwise 

54+  ,46c os ( 2irt/T0)  Itl  <  T0/2  W(f)  = 

otherwise 


(  .54n2-.08(TtfTj2)sin(7TfTo) 


TfiToCn^  -  (TrfTc 


C°S(  T0  W(f)=  27t2T0 
lt|  £  T0/2 


Read  original 
speech  samples 
s(k) 


117 


Charnel  Error 
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APPENDIX  C 


USER'S  GUIDE  AND  SOURCE  LISTINGS 

The  system  software  developed  for  the  speech  coder  for  the  bit  of  9.6 
kb/s,  consists  of  six  program  modules.  They  are  as  follows. 

1.  Options  ( OPTION* FTN) :  This  program  asks  for  various  options  required 
for  transmitter  and  receiver  programs  and  create  OPTl'DAT  and  0PT2*DAT 
files  to  be  used  by  the  transmitter  and  the  receiver  program  respectively. 

2.  Transmitter  (TRAN*FTN):  This  combines  the  frequency  compression  and 
the  quantization  using  Adaptive  Residual  Coder.  The  performance  and  the 
statistics  are  printed  on  the  unit  6.  The  program  asks  for  headers  to  var¬ 
ious  files  and  produces  the  quantizer  level  file  (QUANT*DAT). 

3.  Encoder  (ENCO'FTN):  This  program  reads  the  quantizer  levels  and  gen¬ 
erates  the  codewords  which  are  written  into  CODE’DAT  file.  The  transmitter 
buffer  ( TBUFF* DAT)  is  also  simulated. 

4.  Channel  Error  (CHER5763*DAT  and  CHER2631 *FTN) :  There  are  two  modules 
to  simulate  noisy  channel  with  the  error  correction  simulation  incorporated 
in  them.  CHER5763*FTN  uses  [57,63]  Hamming  Code  and  the  other  module  uses 
[26,31]  Hamming  Code.  These  programs  ask  for  the  percentage  BER,  the  type 
of  error  correction  used  and  output  the  number  of  errors  per  frame.  The 
residual  errors  and  the  BER  after  error  correction  are  also  displayed  on 
the  terminal. 

5.  Decoder  (DECO*FTN):  Reads  the  corrupted  and/or  corrected  code  words 
from  ERCO*DAT  file  and  decodes  them  and  puts  them  into  DECO'DAT  file.  The 
number  of  samples  added  or  deleted  due  to  channel  error  can  be  found  by 
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noting  Che  number  of  samples  in  the  tail  (which  is  displayed  on  Che  termi¬ 
nal)  with  and  without  error.  Note  that  this  number  is  modulo  256.  The  re¬ 
ceiver  buffer  (RBUFF'DAT)  is  also  simulated  in  this  program. 

6.  Receiver  (RCVR'FTN):  reads  the  received  quantizer  levels,  performs 
inverse  quantization,  (ZHAT‘DAT)  does  frequency  expansion  and  writes  the 
speech  output  in  the  SHAT'DAT  file.  This  program  brings  all  the  quantizer 
levels  in  the  memory  simultaneously.  Hence,  this  program  will  give  error 
for  sentences  longer  than  15600  samples.  The  longer  sentences  could  be 
processed  by  increasing  the  Q  buffer  array  size. 

To  run  these  modules  Indirect  Command  file  is  written.  The  task  files 
of  the  above  modules  and  the  ARC  parameter  file  should  exist  in  the  same 
UIC.  If  the  task  files  are  not  available,  following  switches  should  be 
used  to  build  them. 

>ACTFIL  -  10 
> UNITS  -  10 
>ASG  -  SY:6 

The  indirect  command  file  asks  options  regarding  the  various  files  to  be 
retained  for  further  use  or  to  be  deleted. 
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**"•*«*****-*****•*•••****••**«**’•'••***  «*«**♦*■*•*  *•****♦•»**•*•** 
*  * 

•  INDIRECT  COMMAND  FILE  FOR  * 

•  * 

*  9.6  KB/S  TDHS-ARC  SYSTEM  * 


DATE  :  JUNE  30,1981 

NAME  :  ARUN  K.  PANDE 


THIS  PROGRAM  CONSISTS  OF  EXECUTION  OF  FIVE  MODULES.  THESE  MODULES  ARE 


1.  TRANSMITTER 

2.  ENCOOER 

3.  CHANNEL 

( ♦  ERROR  CORR) 

4.  DECODER 

5.  RECEIVER 


TRAN.FTN 
ENCO.FTN 
CHER5763 . FTN 
CHER263 1 . FTN 
DECO. FTN 
RCVR.FTN 


TO  RUN  ABOVE  INDIRECT  COMMAND  FILE,  IT  IS  ASSUMED  THAT  THE  TASKS  OF 
ABOVE  MODULES  EXISTS  IN  THE  SAME  UIC,.  IF  THEY  DONOT .  BUILD  THE  TASKS 
USING  FOLLOWING  SWITCHES  COMMON  TO  ALL  MODULES. 


>ACTFIL"10 
>UN ITS" 10 
>ASG"DK1 : 6 


.ENABLE  SUBSTITUTION 

.SETS  SI  -QUANT. 0AT;«. CODE. DATs". ERCO.DAT;*, DECO. OAT;*, 2HAT.0AT<«- 

•SETS  SZ  -FOR006.DAT;*- 

.SETS  S3  -TBUF. DAT;*, RBUF.DAT;*- 

RUN  OPTION 

.WAIT 

RUN  TRAN 

.ASK  OOPT1  DO  YOU  WANT  TO  DELETE  TRAN  MITTER  OPTION  FILE 
.IFF  DOPTI  .GOTO  10 
.WAIT  PIP 
PIP  OPT1 . DAT ; */0E 
10:  .WAIT 

RUN  ENCO 

20:  .ASK  A 1  DO  YOU  WANT  TO  USE  C57.631  HAMMING  CODE 

. I  FT  A1  .GOTO  30 

.ASK  A2  DO  YOU  WANT  TO  USE  126,311  HAMMING  CODE 
. I  FT  A 2  .GOTO  40 
.GOTO  20 
30:  .WAIT 

RUN  CHER5763 
.GOTO  50 

40:  .WAIT  ;» 

RUN  CHER2S3 1 
50:  .WAIT 

RUN  OECO  ;  | 

.WAIT 

RUN  RCVR  ; 

.WAIT  PIP 
PIP  ’Sl’/DE 

•ASK  CHEK  DO  YOU  WANT  TO  KEEP  STATISTICS  AND  PITCH  DATA  ! 

. IFT  CHEK  .GOTO  60  ,  ! 

•WAIT  PIP 
PIP  ‘ S  2 ’ /DE 
60:  .WAIT 

•ASK  CHEKB  DO  YOU  WANT  TO  KEEP  TRAN  AND  RCVR  BUFFER  FILES  I 

. IFT  CHEKB  .GOTO  70  .  ! 

.WAIT  PIP 
PIP  ’ S3 ’ /DE 

70:  .DELAY  01S  ,  1 

. WA |  T  P  I P  1 

PIP  RECSAM . DAT » * , NEWBLO . OAT ; */DE  1 


Uuuu  uu  o  u  -  oo  U  u  uu 
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THIS  PROGRAM  CREATES  THE  FILES  FOR  OPTIONS  REQUUIRED  IN  TRANSMITTER 
AND  RECEIVER  PROGRAMS. 


INTEGERS  FNAME(  IS ) ,  IBT(  3 ) 
DIMENSION  GAMA( 3 ) 


OPEN< UNIT-1, TYPE-' NEW’ , NAME- ‘ 0PT1 . DAT • .CARRIAGECONTROL- ' LIST '  ) 


TYPE  ENTER  THE  ORIGINAL  SPEECH  FILENAME : !  <  32  CHAR  V 
ACCEPT  10.FNAME  • 
FORMAT! 16A2) 

WRITE! 1 , 10 >F NAME 


TYPE  ENTER  CLIPPING  LEVEL  FOR  PITCH  EXTRACTION.' 

TYPE  C  1.0-  100X  CLIPPING t  0.0-  NO  CLIPPING] ' 

TYPE  *,'  TYPICAL  VALUE!  0.3  TO  0.6  ’ 

ACCEPT  - , CLPP 
WRITE! 1 )CLPP 

TYPE  ENTER  BLOCK  LENGTH  ! KBLK  > ,  SEARCHING  RANGE  ! ITMIN.ITMAX ) ' 

TYPE  FOR  EXAMPLE!  KBLK-80.  ITMIN-20,  ITMAX-100' 

ACCEPT  KBLK. ITMIN.ITMAX 
WRITE! 1,-)KBLK, ITMIN.ITMAX 

TYPE  *,'  ENTER  ARC  PARAMETER  FILENAME:  C  <  32  CHAR  ]’ 

ACCEPT  10.FNAME 
WRITE! 1.10)FNAME 


TYPE  «,'  BLOCKLENCTH  IN  ARC  TO  CALCULATE  SEGSNR ,  TYPICAL!  64’ 

ACCEPT  -,NBL 
WRITE! 1,*)NBL 
C 

TYPE  SELECT  THE  TYPE  OF  PITCH  ESTIMATOR:’ 

TYPE  * , ‘  I  ■  AUTOCORRELATION!  2  ■  AMDF  ;  3  -  CLIPPED  SP  AUTO' 

TYPE  -  CLIPPED  SP  AMDF i  5  -  3-VALUE  C  CLIPPED  AUTOCORRELATION' 

TYPE  6  -  3-VALUE  C  CLIPPED  SPEECH  AMDF i  7  -  2-VALUE  C  CLIPPED  ’ 
TYPE  SPEECH  AUTOCORRELATION!  8  -  2-VALUE  C  CLIPPED  AMDF • 

ACCEPT  - , IOPT 
WRITE!  l,«HOPT 
C 

TYPE  DO  YOU  WANT  TO  HAVE  A  BUFFER  CONTROL 7  t  1-YES;  2-NO  ]' 

ACCEPT  * , IBCNT 
WRITE! l.*)IBCNT 
C 

IF! IBCNT  .N£.  1 >COTO  20 

TYPE  ENTER  BUFFER  THRESHOLDS  FOR  BUFFER  CONTROL:' 

TYPE  •■'  FOR  EXAMPLE:  C  600,  800,  900  ]' 

ACCEPT  *, IBT 
WRITE! 1,-)IBT 

TYPE  ENTER  THE  SCALARS  TO  CHANCE  THE  OUANTIZER  THRESHOLDS' 

TYPE  *.'  FOR  EXAMPLE:  C  0.5,  0.7,  1.0]’ 

ACCEPT  * .GAMA 
WRITE! 1 , *  >GAMA 
20  CONTINUE 

C 

TYPE  ENTER  HAMMING  ERROR  CORRECTING  CODEi' 

TYPE  *,'  !  INFO  BITS,  TOTAT  NUMBER  OF  BITS  >i  FOR  EXAMPLE :! 57 , 63 ) ' 
ACCEPT  •, INFO.NTOT 
WRITE! l.»)INFO,NTOT 


OPTIONS  FOR  THE  RECEIVER  PROCRAM. 

OPENIUNIT-2. NAME** OP T2 . OAT ' .TYPE-’NEW* .CARRIACECONTROL- ’ L 1ST’  } 

WRITE! 2, 101FNAME 

TVPE  ENTER  THE  CLIPPING  LEVEL  FOR  PITCH  EXTRACTION  AT  RECEIVER: ’ 
TYPE  *,•  C  1.0  -  100X  CLIPPING)  0.0  -  NO  CLIPPING  3 ’ 

TYPE  *,•  TYPICAL  VALUE-  0.3  TO  0.6' 

ACCEPT  *,CLP 
WRITE! 2 , * )CLP 


TYPE  SELECT  THE  TYPE  OF  PITCH  ESTIMATOR: 1 

TYPE  «, •  1  -  AUTOCORR t  2  -  AMDFi  3  -  CLPPD  SP  AUTOCORR* 

TYPE  *  -  CLPPD  SP  AMDFi  5  -  3-VALUE  CENTER  CLIPPED  AUTOCORR1 


TYPE  d 

tm  O  & 


3-VAL.  C  CLPPO.AMDF u 7» -.2-VALUE  CENTER  CLIPPED  AUTOCORR1 

2-VAL\}fe  tEKTEk  tLII»kfeb  AttbF 1 


ACCEPT  *, IOPT1 


WRITE! 2 , * ) IOPT1 


TYPE  ENTER  BLOCK  LENGTH  !KBLK>,  SEARCHING  RANGE  ! ITMIN , ITMAX ) * 
TYPE  TYPICAL  VALUES:  40,  20.1001 
ACCEPT  *,KBLK, ITMIN, ITMAX 
WRITE! 2. *)KBLK, ITMIN. ITMAX 


STOP 

END 


non  n  o  nn  n  n  n  n  n  n  n  nn  n  n  n  n  n  n  n  n  on  n  n  r>  n  n  n  n  r>  n  r>  n  n  n  nr>  n  n  n  n 
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TOHS  -  ARC  SYSTEM 
TRASMITTER 

BY  ARUN  K.  PANOE 


THIS  IS  A  NEW  APPROACH  TO  SPEECH  DIGITIZATION  AT  MEDIUM  BAND  BIT 
RATES  OF  9.6  TO  16  KB/S.  THE  TECHNIQUE  IS  BASED  ON  A  COMBINATION 
OF  TIME-DOMAIN  HARMONIC  SCALING  <  TDHS  )  AND  ADAPTIVE  RESIDUAL 
CODER  (ARC). 

TDHS  ALGORITHM  CONSISTS  OF  PROPERLY  WEIGHTING  SEVERAL  ADJSCENT 
INPUT  SIGNAL  SEGMENT  OF  PITCH  DEPENDENT  DURATION  BY  SUITABLE 
WINDOW  FUNCTIONS.  AS  A  RESULT  OF  THIS,  THE  NUMBER  OF  SAMPLES 
TO  BE  TRANSMITTED  CAN  BE  REDUCED  BY  A  FACTOR  OF  TWO.  IF  THE  BIT 
RATE  IS  KEPT  THE  SAME,  THE  NUMBER  OF  BITS  ALLOWED  PER  SAMPLE  IS 
DOUBLED,  AND  THE  PERFORMANCE  OF  THE  CODER  CAN  BE  IMPROVED 
SIGNIFICANTLY.  WITH  THE  MORE  SLOWLY  VARYING  FREQUENCY  DIVIDED 
SIGNAL  AS  INPUT.  THE  PREDICTION  AND  ASSOCIATED  QUANTIZATION  IN  THE 
ARC  SYSTEM  WILL  PERFORM  BETTER,  THUS  INCREASING  THE  PERFORMANCE 
OF  THE  SYSTEM. 

PROGRAM  NAME:  TRAN.FTN 
DATE  :  JUNE  30,1981 


THIS  PROGRAM  READS  FOLLOWING  OPTIONS: 

1.  BLOCK  LENGTH  ( KBLK  ), SEARCHING  RANGE! ITMIN 

. ITMAX ) 

2.  NAME  OF  THE  SPEECH  FILE. 

3.  NAME  OF  THE  PARAMETER  FILE. 

4.  HEADERS  FOR  QUANTIZER  OUTPUT 

5.  OPTION  REGARDING  PITCH  METHOD  TO  BE  USED. 

6.  BLOCK  LENGTH  NBL  FOR  SEGSNR . 

7.  CLIPPING  LAVEL  TO  GENERATE  CLIPPED  SPEECH 

8.  BUFFER  CONTROL  OPTIONS. 

9.  HAMMING  CODE  TO  BE  USED. 

10.  HEADER  FOR  PITCH  PRINTOUT. 

INTEGER  H0(  40 ),IPICH( 400 ) , FNAME ( 1 6 ) , SQ ,Q, F I LEN 
INTEGER*?  SPEECH! 400 > , F BCNT , OL DQ 

DIMENSION  Y  <  256 ) ,  H(400),  SQ! 1 6 ) , I  BUFF  1 ( 5 1 2 > , I BUFFZ! 5 1 2  ) 

I NTEGER*2  F1.F2.F3.F4, STAT 

COMMON  /PRED/G.ND, RMSM IN ,ALP,AINV,KQ, NSPSAM , A! 12 ) .DVHAT! 12>,EV, 

1  ISTAT1(40>,EP,ALAD,SPERB1( 200 ) , SPER82 ( 200 ) , V! 12 > , SNRB! 200) 

2  , SNRQB ( 200 ) 

COMMON  /RMS/RMS, NBL, IARG.ENGY I , ENGY2 , ENGY3 , ENGY4 , SPER1 ,SPER2 
COMMON  /CLIPP/CLPP 

COMMON  /ADDN/F1 .F2.F3.F4 ,BT1 .BT2.BT3 
1  , JCNT , STAT ( 30 ) , GAMA! 4  ) 

COMMON  /FN/FILEN! 16  ) 

MODN(K)  -  K  -  ( K- 1 )/ 1 6  *  16 

OPEN! UN IT"3 , TYPE" ' OLD ' , NAME ■ ' OPT1 . DAT' )  (OPEN  THE  OPTION  FILE. 

--  OPEN  THE  SPEECH  FILE:  READ  THE  HEADER  AND  THE  SPEECH. 

REAO! 3 , 101FNAME 
FORMAT! 16A2  > 

OPEN! UNIT«1 ,  TYPE" ' OLD ' ,  READONLY,  NAME-FNAME,  SHARED)!  OPEN  SPEECH  FILE 
REAO! 1 ,20>NSENT, IRATE.NSAMP , I  UP  PR , ILOWR , NTERMS , HD 
FORMAT! 615, 10X.40A1 ) 
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CONTINUE 

CREATE  NEW  FILE  FOR  QUANTIZER  OUTPUT.  WRITE 
THE  HEADER  ON  QUANTIZER  FILE. 

Tver' •N?T^uiUTtu!“ ' NEW' ’  NAME" 'QUANT . DAT' ,  CARRIAGECONTROL- ' L 1ST’ ) 
!«!!„*•  TVPE  THE  HEADER  FOR  THE  QUANTIZER  FILE:' 

ACCEPT  50,  HD 
FORMAT! 40A1 ) 

NSPSAM  «  N SAMP/2 

WRITE! 2,20! NSENT , IRATE .NSPSAM, IUPPR, I LOWR , NTERMS , HD 


READ! 3,*)CLPP 

DEFINE  AND  INITIALIZE  VARIOUS  PARAMETERS. 
NUMP  -  0 

READ!3,*1K8LK1,ITMIN, ITMAX  1  BLOCK 

IOFF  •  0 
ICNT  -  0 
BUFCNT-0, 

JU  -  1 

NADD  •  0 

FI  »  0 

F2  -0 

F3  -  0 

F  4  »  0 

JCNT  -  I 

GAMA! \ )m\ .2 

NUM1  -  KBLK1  ♦  ITMAX 

RE AO! 3.101FILEN 

FILEN! 321-0 

CALL  INSTRT 


1  BLOCK  LENGTH  AND  SEARCHING  RANGE 


1  TVPE  0F  PITCH  ESTIMATOR. 

PFf°^,«)F8CNT  |  BUFFER  CONTROL  FLAG. 

IF1FBCNT  .NE.  I 1GOTO  155 

oc?2!3’">BTl,BT2'BT3  1  BUFFER  THRESHOLDS. 

GAMA!2)-GAMAIl,<5AMA2'SAMA3  1  SCALARS  T0  CHANGE  QUANT  THRESHOLDS 

GAMA! 3 1-GAMA2 

GAMA! 4 ) -GAMA 3 

GOTO  158 

GAMA! 21-1.0 

GAMA! 31-1.0 

GAMA! 41-1.0 

BT 1-3000.0 

BT2-4500. 

BT3-6000. 

CONTINUE 

5p«i2N^HiMS?PAR  1 HAMMING  C0CE- 

AVGB IT-! FLOAT! NUMOl l/FLOAT! NQPAR ) )»3 .0 


TOABUFFERAXIMUM  LAG  <ITMAX)  PLUS  BLOCKSIZE  !KBLK1>  SPEECH  SAMPLES 

CHECK  IF  NUM1  IS  EXACT  MULTIPLE  OF  16. 

IF ! MOD! NUM1 ,161  .EQ.  01GOTO  55 
IF  NOT,  MAKE  IT  EXACT  MULTIPLE  OF  16. 

NUM1-!NUM1/161*16*16 
READ! 1.301! IBUFF1! I ),  I-1.NUM11 


ICNT-ICNT+NUM1 

C 

C  EXTRACT  PITCH  FROM  SAMPLES  IN  BUFFER1  . 

C 

CALL  PITCH! I8UFF1 ,  NP ,  ITMAX,  ITMIN,  KBLK1 ,  NUMP,  NUM1,  IOPT2) 
IPICH(JJ  )  ■  NP 
NP2  »  NP  ♦  NP 
C 

56  CONTINUE 

C 

C -  CHECK  IF  NUMBER  OF  SAMPLES  IN  BUFFER1  IS  LESS  THAN  TWO  TIMES  PITCH 

C - PERIOD  SAMPLES  •-  IF  LESS,  ADD  TO  IT  h  ROM  SPEECH  BUFFER  AND  THEN 

c -  PROCEED  TO  FREQUENCY  DIVISION. 

I F ( NUM l  .LT.  NP2  >GOTO  61 

C  COMPUTE  HOW  MANY  EXTRA  SAMPLES  IBUFF1  HAS. 

NUM 1 2  ■  NUM1  -  NP2 

C  IS  NUM 1  EXACTLY  EQUAL  TO  NP2? 

C 

I F ( NUM1 2  .£Q.  01GOTO  73 
C 

C  IF  NOT.  TRANSFER  EXTRA  SAMPLES  FROM  IBUFF1  TO  IBUFF2. 

C 

DO  SB  I-l  ,  NUM1 2 

I8UFF2!  I  )  -  IBUFFKNP2+I  ) 

58  CONTINUE 

C 

GOTO  73 
C 

C  COMPUTE  HOW  MANY  MORE  SAMPLES  ARE  NEEDED  IN  IBUFF1  TO  MAKE  THE 

C  NUMBER  EXACTLY  EQUAL  TO  2*NP. 

C 

61  MORE  -  NP 2  -  NUM1 

C 

C  IS  MORE  EXACT  MULTIPLE  OF  16. 

C 

I F (  MOD (  MORE  ,16)  .EQ.  0IGOTO  70 
C 

C  IF  NOT,  FIND  THE  NUMBER  WHICH  EXACT  MULTIPLE  OF  16  AND  IS  CLOSEST  TO 

C  MORE. 

C 

NUM2  «  <  MORE/ 1 6 )* 1 6  ♦  16 

READ! 1 ,30, END-533) (SPEECH! I ),I-l,NUM2) 

533  CONTINUE 

ICNT  •  ICNT  ♦  NUM2 

I F ( ICNT  .GT.  NSAMP (GOTO  999 

NUM3  •  NUM2  -  MORE 

OO  64  I-l, MORE 

I  BUFF  1 <  NUMI  + 1 )  -  SPEECH! I  ) 

64  CONTINUE 

DO  67  I-l , NUM3 

IBUFF2II)  -  SPEECH! MORE  +  I  ) 

67  CONTINUE 

NUM12  -  NUM3 
GOTO  73 
C 

C  SINCE  "MORE"  IS  EXACT  MULTIPLE  OF  16,  READ  "MORE"  SAMPLES  FROM 

C  SPEECH  FILE  AND  PUT  IN  IBUFF1. 

C 

70  READ! 1,30, END-537)! IBUFFI! NUM1+I >,  I-l, MORE) 

537  CONTINUE 

ICNT  -  ICNT  ♦  MORE 
IF! ICNT  .GT.  NSAMP )GOTO  999 
NUMI2-0 
C 


no  non  no  non  onooo  o  in  n  n  noon  n  n  >j  o  on  onon^iono 
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CALCULATE  THE  WINDOW  FUNCTION. 

CALL  WINDOW! H,NP>  I  COMPUTE  TRIANGULAR  WINDOW. 


—  PERFORM  FREQUENCY  DIVISION  OPERATION. 

DO  72  I  -  I,  NP  1  FREQUENCY  DIVISION  OPERATION. 

NUM4  *  NP  *  I 

Y ( I >  -  FLOAT! IBUFF1INUM4 > )  +  H( I )  *  FLOAT! I  BUFF  1! I >  - 
I  I  BUFF  1 ( NUM4  )  ) 

IARG  »  NADD  +  I 
YY  •  Y!  I  ) 

—  CALL  ARC  SUBROUTINE  WHICH  RETURNS  THE  QUANTIZER  OUTPUT  CORRESPONDING 
--  TO  FREQUENCY  DIVIDED  SPEECH  SAMPLE  AND  ALSO  NOISY  SPEECH  SAMPLE. 

CALL  ARC! YY.Q.YHAT) 

JQQ-Q 

CALL  BUFCTLiBUFCNT , OQQ, AVGB IT )  I  BUFFER  CONTROL  IF  OVERFLOW. 

—  WRITE  THE  QUANTIZER  OUTPUT  IN  THE  FILE. 

N 1  •  MOON! IARG) 

SQ! N1  )  •  Q 

I F ( N 1  .EQ.  16)  WRITE12.33)  SQ 

CONTINUE 


NADD  »  NADD  ♦  NP 
NUM1  «  K3LK1  +  ITMAX 

THERE  ARE  ALREADY  NUMI2  SAMPLES  IN  IBUFF2.  PUT  !NUM1-NUM12>  MORE 
SAMPLES  IN  IT. 

NSP  =  NUM1  -  NUMIZ 

MAKE  NSP  EXACT  MULTIPLE  OF  16. 

NSP  -  ! NSP / 1 6  )  *16+16 

READ!  1 .30.END-541 >!  I BUF F 2! NUMI Z+ 1  ) ,  I»1 .NSP  ) 

1  CONTINUE 

ICNT-ICNT+NSP 

I F < ICNT  .GT.  NSAMP )GOTO  999 
NUMNSP  «  NUM12  ♦  NSP 
EXTRACT  PITCH 


0 J  »  JJ  +  1 

CALL  PITCH! 1BUFFZ.NP, ITMAX, ITMIN.KBLK1 ,  NUMP , NUMNSP , IOPTZ > 
IP  I C H < 00  )  •  NP 
NP 2  -  NP  ♦  NP 

DOES  IBUFF2  HAVE  SAMPLES  LESS  THAN  Z*NP. 

IF ( NUMNSP  .LT.  NP2  >GOTO  85 

HOW  MANY  EXTRA  SAMPLES  IBUFF2  HAS? 

NUM12  «  NUMNSP  -  NP 2 

DOES  IBUFF2  HAVE  EXACT  2*NP  SAMPLES? 

I F ( NUMI 2  .EQ.  Z JGOTO  82 

TRANSFER  EXTRA  SAMPLES  TO  IBUFF1. 
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c 


79 

82 

C 

C 

C 

C 

85 

C 

C 

C 

C 

C 

C 


544 


C 

C 

c 

c 

c 

c 


86 


87 

C 


91 

547 


88 

C 

C 

C 

c 


1 


c 


92 

C 

c 


00  79  1-1 , NUM1 2 

I  BUFF  1 ! I 1-I8UFF2! NP2+I > 

CONTINUE 

GOTO  88 

HOW  MANY  MORE  SAMPLES  ARE  TO  BE  ADDED  TO  IBUFF2  SO  THAT  IT  WILL 
HAVE  2*NP  SAMPLES  ? 

MORE  -  NP2  -  NUMNSP 

IS  MORE  EXACT  MULTIPLES  OF  167 

IF ( MODI  MORE , 16  )  .£Q.  flr>GOTO  91 

IF  NOT,  MAKE  IT  EXACT  MULTIPLE  OF  16. 

NUM2- <  MORE / 16)  *  16  ♦  16 

READ<  1 .38, END-544  X  SPEECH! I > , I « 1 , NUM2  ) 

CONTINUE 

ICNT- ICNT+NUM2 

IFdCNT  .GT.  NSAMP  1G0T0  999 

HOW  MANY  EXTRA  SAMPLES  ARE  READ  THAN  NEEDED. 

NUM3  -  NUM2  -  MORE 

TRANSFER  "MORE”  SAMPLES  TO  IBUFF2  AND  NUM3  SAMPLES  TO  IBUFF1. 

DO  86  I-l .MORE 

IBUFF2( NUMNSP* I >-SPEECH< I > 

CONTINUE 

DO  87  I-l , NUM3 

IBUFFK  I  >»SPEECH(  MORE*  I  ) 

CONTINUE 

NUM1 2-NUM3 
GOTO  88 

READ( 1 , 30, ENO-547 ) ( IBUFF2( I  +  NUMNSP  >, I-l .MORE ) 

CONTINUE 

ICNT  « ICNT -MORE 

NUM12-0 

CALL  WINDOW! H , NP )  1  COMPUTE  TRIANGULAR  WINDOW  FUNCTION 


DO  92  I  »  1,  NP  1  FREQUENCY  DIVISION  OPERATION. 

NUM4  -  NP  ♦  I 

Y  (  I  )  -  FLOAT!  I8UFF2!  NUM4  >  )  +  HID  *  FLOAT!  IBUFF2I I >  - 
I  BUFF  2! NUM4  >  > 

IARG  -  NADD  *  I 
YY  -  Y!  I  ) 

CALL  SUBROUTINE  ARC. 

CALL  ARC! YY , Q , YHAT )  l  QUANTIZE  COMPRESSEO  SP . 

JQQ-a 

CALL  BUFCTL ( 3UFCNT , JOO , AVGB IT )  1  CONTROL  BUFFER  IF  OVERFLOW 

N 1  -  MOON! IARG  ) 

SQ! N1 >  -  0 

I F ( N 1  .EQ.  16  )  WR I TE ( 2 , 33 )SQ 

CONTINUE 

NADD  -  NADD  ♦  NP 

THERE  ARE  ALREADY  NUM12  SAMPLES  IN  I  BUFF  1 . TRANSFER  NSP-NUM1 -NUM1 2 
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C  SAMPLES  TO  IT. 

C 

NSP  «  NUM1  -  NUM12 
NSP  •  ( NSP / 1 6 >*16  ♦  16 

READ! 1 .30. END-651 )( IBUFF 1 ( NUM1 2+1 > , I » l , NSP > 

551  CONTINUE 

ICNT* ICNT*NSP 
I F ( ICNT.GT. NS AMP  1G0T0  999 
NUMNSP-NUM12+NSP 
C 

c 

00  ■  00  ♦  1 

c 

CALL  PITCH!  IBUFFl,  NP.  ITMAX,  ITMIN,  KBLK1 .  NUMP , NUMNSP , I0PT2 > 

IPICH(OO)  ■  NP 

NP2  «  NP  ♦  NP 

NUM1 “NUMNSP 

GOTO  56 

999  CONTINUE 

C 

C 

CALL  INEND  1  PRINTOUT  ON  UNIT  6  ALL  THE  STATISTICS. 

TYPE  *.'  TYPE  THE  HEADER  FOR  THE  PRINTOUT. ( 40CHAR  ONLY)' 

ACCEPT  775, HD 

775  FORMAT! 40A1  > 

WR ITE!6,774)H0 

774  FORMAT! Ill ,40A\ ,// > 

WRITE!  6,776  > 

776  FORMAT! ////6X, '  SAMPLE  NUMBER  ',6X,'  PITCH  PERIOD  ’//> 

ISTRT  -  1 

I  END  -  K8LK1 ♦ ITMAX 
DO  777  I  ■  1 , NUMP 

IP  ■  IPICH1 I  ) 

WRITE!  6,778  > ISTRT, I  END, IP 
I2P  -  IP  ♦  IP 
ISTRT  -  ISTRT  ♦  I2P 

IENO  -  IEND  +  I 2P 

777  CONTINUE 

778  FORMAT! 6X , 1 5 , 1 X , ' - '  , 1 X , 1 5 , 1 0X , 1 3  > 

33  FORMAT! 1612) 

STOP 

END 

C 

C 

C 

c  •  « 

C  *  PITCH  EXTRACTION  * 

C  *  * 

C 

c 

c 

c 

SUBROUTINE  PITCH!  IBUF,  NP ,  ITMAX,  ITMIN,  KBLK1 ,  NUMP , NBUFF , IOPT2 ) 
DIMENSION  A! 200),  IBUFF!61Z>,  IAI200),  IBUF(512> 

COMMON  /CL  I PP/CLPP 
C 
C 

NUMP  »  NUMP  ♦  1 
N 1  «  NBUFF/3 
N2  •  Nl+Nl 
DO  5  I  •  1, NBUFF 

I  BUFF! I  ) - 1 BUF !  I  > 

5  CONTINUE 

00  10  I  -  1,  200 
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a< i )  »  0.0 

IA( I )  -  0 
10  CONTINUE 

c 

C -  CHECK  IF  CENTER  CLIPPING  IS  ASKED  FOR. 

C 

IF < IOPT2  .LE.  2)  GOTO  280 
C 

c -  FIND  OUT  ABSOLUTE  MAXIMUM  OUT  OF  NBUFF  SAMPLES  IN  THE  BUFFER* IBUFF * 


IBIG1  •  0 
IBIG2  •  0 

DO  120  I  -  l.Nl  I  FIND  LARGEST  SAMPLE  IN  1ST  OF  3  PARTS. 

ISPA8S«ABS( IBUFF! I  )  ) 

IFdSPABS  -GT.  IBIG1  )  IB  IG  1  ■  ISPABS 

120  CONTINUE 
C 

c 

DO  121  I  »  N2.  NBUFF  1  FIND  LARGEST  SAMPLE  IN  3RD  OF  3  PARTS. 
ISPABS-ABS! IBUFF! I  )  ) 

IFdSPABS  .GT.  IBIG2  >  IB IG2» ISPABS 

121  CONTINUE 


IB-IBIG1-IBIG2 

IFIIB  .GE.  0JI8IG-IBIG2  I  FIND  MINIMUM  OF  TWO  LARGE  VALUES. 

IF! IB  .LT.  0HBIG-IBIG1 
C 

C -  ENTER  THE  CLIPPING  LEVEL. 

C 

CL  »  CLPP  *  FLOAT! IBIG)  - 

C 

c -  CHECK  IF  CENTER  CLIPPING  WITH  THREE  OR  TWO  VALUES  IS  REQUIRED. 

C 

IF!  IOPT2  .GT.  4)  GOTO  155  I  YES,  2  OR  3  VALUE  CLIPPING. 

C 

C -  CLIPP  THE  SPEECH  WAVEFORM. 


CLM--CL 

DO  140  I - 1 , NBUFF  I  GENERATE  CLIPPED  SPEECH. 

XFLT»FLOAT! IBUFF!  I  > ) 

IF!  <  XFLT  .LE.  CL)  .AND.  ( XFLT  .GT.  CLMMGOTO  147 
IF!  XFLT  .GT.  CL  ) I BUF F (I ) - 1  BUFF (I  ) - 1 F I X! CL  ) 

IF!  XFLT  .LT.  CLM ) IBUFF (I >• IBUFF (I >  +  I F I X! CL > 

GOTO  140 

147  IBUFF ! I  )»0 

140  CONTINUE 

GOTO  280 
155  CONTINUE 

IF ! IOPT2  .GT.  6 )GOTO  360  t  YES,  2-VALUE  CLIPPING. 

CLM--CL 

DO  160  I - I . NBUFF  l  GENERATE  3-VALUE  CLIPPED  SP . 

XFLOT-FLOAT! IBUFF!  I  )  ) 

I F ( ( XFLOT  .LE.  CL)  .AND.  ( XFLOT  .GT.  CLM  )  > IBUFF! I  )-0 
IFIXFLOT  .GT.  CL ) IBUFF! I  )-+l 
1 F ( XFLOT  .LE.  -CL ) IBUFF! I >— 1 


160  CONTINUE 

IF! IOPT2  .GT.  5  >GOTO  220 
C 

c -  COMPUTE  AUTOCORRELATION  FUNCTIONS 


IBIGG--1000 

DO  180  IT-ITMIN, ITMAX  I  3-VALUE  C  CLPPD  AUTOCORR  METHOD. 

I SUM»0 

OO  170  J-1.KBLK1 

IF<< < IBUFF! J+IT1.LT.0). AND. dBUFFIJ >.LT.0> >  .OR.  !! IBUFF! J+ IT > 
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l  . GT .0) .AND . ( I  BUFF (  0 ) .  GT .0) ) ) ISUM- ISUM+ 1 

IF!  !  !  I8UFF!  J  +  IT) .LT.0) .AND.! IBUFF! 0 > .GT.0) )  .OR.  ( ( IBUFF< J  +  IT) 
1  .GT.0) .AND. < I  BUFF ( 0 ) . LT . 0 ) ) ) I  SUM- 1  SUM- 1 

170  CONTINUE 

IA<  ITI-ISUM 

IFdBIGG  .LT.  I A<  IT )  >  IBIGG-IA(IT) 

180  CONTINUE 

00  190  I-ITMIN, ITMAX 

I F  < I  BIGG  .EQ.  I A ( I  )  >GOTO  200 

190  CONTINUE 

200  NP - I 

WRITE! S , *  )NP 
RETURN 
C 

c -  CALCULATE  AMOF  FUNCTIONS 

220  CONTINUE 

ISMALL-4096 

00  230  IT-ITMIN, ITMAX  l  3-VALUE  C  CLPPD  AUTOCORR  METHOD. 

I  SUM-0 

OO  240  J-1.KBLK1 

IF( ( ( IBUFF!0*IT).EQ.8).AND.! I  BUFF ( J ) . NE . 0 ) )  .OR. 

1  (( IBUFF (J*IT>.NE.0). AND.! IBUFF ( J ) . EQ .0 ) >  > ISUM- ISUM*1 

I F ( ( ( I8UFFIJ+IT). GT.0). AND.! I  BUFF ( 0 ) . LT .0 ) )  .OR. 

1  <  < I  BUFF ( J*IT).LT.0).AND. ( I  BUFF! J ) . GT .0 ) >  >ISUM-ISUM*2 

240  CONTINUE 

IA< IT )  •  I  SUM 

IF1ISMALL  .GE.  IA(  IT) MSMALL-IA! IT) 

230  CONTINUE 

OO  250  I-ITMIN, ITMAX 

IF  1  I  SMALL  .EQ.  IA! I > (GOTO  260 

250  CONTINUE 

260  NP-I 

WRITE! 5,*  >NP 
RETURN 

280  CONTINUE 

IF! < I0PT2 . EQ . 2 ) .OR . ! I0PT2 . EQ . 4 ) >GOTO  340 
B I 6*0  0 

DO  300  IT-ITMIN, ITMAX 
SUM-0.0 

OO  290  J-1.KBLK1 

SUM -SUM* FLOAT! IBUFF(U*IT) > ‘FLOAT! IBUFF!U ) ) 

290  CONTINUE 

A! IT1-SUM 

I F  <  BIG  .LT.  A! IT) )BIG-A! IT) 

300  CONTINUE 

OO  310  I-ITMIN , ITMAX 

IF! BIG. EQ. A! 1 ) )GOTO  320 

310  CONTINUE 

320  NP-I 

WR I TE ( 5 , *  >NP 
RETURN 

340  CONTINUE 

SMALL  •  1.0E+09 
DO  60  IT  •  ITMIN,  ITMAX 
SUM  •  0.0 

DO  50  J  -  1,  KBLK1 

SUM  -  SUM  +  ABSIFLOAT! I  BUFF! 0  + IT )  -  IBUFFIO))) 

50  CONTINUE 

A! IT)  -  SUM 

IF!  SMALL  .GE.  A! IT))  SMALL  -  A! IT) 

60  CONTINUE 

00  70  I  -  ITMIN,  ITMAX 

IF! SMALL  .EQ.  A! I  ) )  GOTO  80 

70  CONTINUE 
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\ 

1 


80 

360 

C 

380 


400 

420 

440 

460 

480 


490 


500 


520 

540 


C 

C 

C 

C 

C 

C 

C 

C 

C 


10 


CONTINUE 
NP  -  I 

WR ITE ( 5 , *  >NP 

RETURN 

CONTINUE 

CLIPP  THE  SPEECH  TO  TWO  VALUES. 

CL  *0 . 0 

DO  380  I  -  1.N8UFF 

XFLOT “FLOAT? IBUFF!  I  >  > 

IF!  XFLOT  .LE.  CL  > I  BUFF  < I >--t 
IF!  XFLOT  .GT.  CL  >IBUFFU>-1 

CONTINUE 

IF!  IOPT2  .GT.  7  >GOTO  480 
IBIGG  -  -1000 
DO  420  IT-ITMIN. ITMAX 
ISUH-0 

DO  400  J-l , KBLK1 

I F! I  BUFF  <  J  +  IT  )  .EQ.  IBUFFtJ) ) I  SUM" I  SUM* 1 
IF! I  BUFF  I J  +  IT  >  .NE.  IBUFF ! 0 >  )  I  SUM- 1  SUM- 1 

CONTINUE 
IA! IT I-I SUM 

IF! I B I GG  .LT.  I A! IT )  )  I B IGG- IA( IT ) 

CONTINUE 

DO  440  I-ITMIN, ITMAX 

IF! IBIGG  .EQ.  IA< I ) >GOTO  460 

CONTINUE 

NP-I 

WRITE! 5 , * )NP 
RETURN 
CONTINUE 
I  SMALL -4096 

DO  500  IT-ITMIN. ITMAX 
I  SUM-0 

DO  490  J-l.KBLKl 

IF! IBUFF! J  +  IT)  .NE.  I  BUFF  1 0  )  )  I  SUM- 1  SUM* 2 

CONTINUE 
IA!  I T ) - 1  SUM 

I F ! I SMAL  L  .GE.  IA( IT ) ) I  SMALL- IA( I T ) 

CONTINUE 

DO  520  I-ITMIN, ITMAX 

IF ! ISMALL  .EQ.  IA( I  )  (GOTO  540 

CONTINUE 

NP-I 

WRITE! 5,*  )NP 

RETURN 

END 


WINDOW  FUNCTION  * 

* 


SUBROUTINE  WINDOW! H.NP) 

DIMENSION  H! 400 ) 

DO  10  I  ■  1.  NP 

HU)  -  1.0  -  FLOAT!  1-1  >/FLOAT!NP-l  ) 


CC 

C 


CONTINUE 

RETURN 

END 


on  ooononoonn  —no  oonoooo 
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QUANTIZER 


QUANTIZER  IS  VARIABLE  LEVEL  QUANTIZER. NUMBER  OF  LEVELS  ARE  KQ. 
SUBROUTINE  QUANTIX. V, I ) 

I NTEGER*2  F 1 . F2 . F 3 . F 4 , STAT 

COMMON  /QUAN/T( 20)  .OUT* 20) ,EXPN< 20 ) , S IZE , SMI N , NO 
COMMON  /RMS/RMS. NBL , IARG 
COMMON  /ADDN/F1 . F 2 , F 3 , F 4 . BT1 . BT2 , BT3 
1  , JCNT.STAT* 30)  .GAMA* 4 > 


Xl-ABS* X/SIZE  ) 

F-0.5 

IF*  X . LT.0.  )F  — .5 
1-1 

DO  20  K- 1 , NQ 

I F ( JCNT  .EQ.  I )TNEW-T<  K )  I  OLD  THRESHOLDS  IF  BUFFER  SMALL. 

I F ( OCNT  .NE.  1 >TNEW*GAMA( JCNTl+T* K  )  I  NEW  THRESHOLD 
20  IF* XI .GE .TNEW)  I-2*K+.5+F 
0»( 1*2  )/2 

Y-2.-F-OUT* J  >*SIZE 

SIZE-EXPN*0 )«SIZE 

SIZE -AMAXK SIZE, RMS«SMIN> 

RETURN 

END 


* 

*  INITIALIZATION 

* 


PARAMETERS  ARE  DEFINED  AND  INITIALIZED. 

SUBROUTINE  INSTRT 
INTEGER  FILEN 

COMMON  /QUAN/T* 20) ,OUT< 20 ) , EXPN( 20 ) , S IZE , SMI N . NQ 

COMMON  /PREO/G.N, RMS MI N.ALP.AINV, KQ , NSP SAM , A( 12>,VHAT( 12), EV 

1  , ISTATl*  40) ,EP .ALAO.SPERBl*  200) .SPERB2*  200)  ,V< 12 ) ,SNRB< 200) 

2  , SNRQB<  200 ) 

COMMON  /RMS/'RMS.NBL, IARG.ENGYI . ENGY2 , ENGY3 , ENGY4 , SPER1 .SPER2 
COMMON  /1NIT/0I 
COMMON  /FN/FILEN< 16  ) 


READ* 3,40)FILEN  I  READ  PARAMETERS  FOR  ARC  SYSTEM. 

40  FORMAT* 16A2) 

OPEN*  UN  IT-8,  NAME -FILEN, TYPE -'OLD’  ) 

READ(B,*)AINV,ALP,ALAD,G,N, RMSMIN , SMI N 
WRITE*  6,2 )AINV, ALP, ALAD ,G,N,RMSMI N, SMI N 

2  FORMAT* /6X, 'AINV-' .F5.2.2X, 'ALP-' ,F5.2,2X, 'ALAD-' , F5 . 2 , 2X , ' G- ' 

1  F5.3.2X,  N-’ .I2.ZX, 'RMSMIN-' , F5 . 1 , 2X ,  1  SMI N- ' . F5 . 2/ ) 

READ*  8 , • >KQ 
WRITE*  6,3)KQ 

3  FORMAT* 6X, 'NUMBER  OF  QUANT  LEVELS-', 12) 

NQ-KQ/2 

NQQQ-NQ+ 1 

READ*  8,* )( EXPN* I ) , I -1 , NQQQ  > 

WRITE* 6, 4 )( I, EXPN* I > . I - 1 , NQQQ ) 


nnnn  nonnooono 


I 


FORMAT<6X<6<  'EXPN<  ' ,12,  ’ )• 1 ,  F  6 . Z , 2X  )  ) 
READ(  8  ,  *  )(  OUT(  I  ),  I « 1  ,  NQQQ  ) 

WRITE! 6 , 5  X I ,0UT<  I ) , 1-1  .NQQQ) 

FORMAT!  6X,6<  'OUT(  '  ,  12  ,  '  >-’ .F6.2.2X)  > 
00  30  I » 1 , NQ 

T<  I  )-<  0UT(  I  )*0UT(  r  +  t  )  )/2. 

SIZE-100. 

DO  118  1-1,12 
VHAT< I >-0. 

V< 1 )«0. 

A (  I  >«0. 

RMS-RMSMIN 
A(  1  l-AINV 
EV-0. 

EP-0. 

ENGY1-0. 

ENGY2-0. 

JI-0 

ENGY3-0. 

ENGY4-0. 

SPER1-0. 

SPER2-0. 

READ( 3 , *  >NBL  1  BLOCK  LENGT) 

00  621  1-1 ,K0 

ISTAT1 ( 1  )  -0 

RETURN 

END 


1  BLOCK  LENGTH  TO  CALCULATE  SEGSNR. 


ADAPTIVE  RESIDUAL  CODER 


SUBROUTINE  ARC  RETURNS  QUANTIZER  OUTPUT  AND  YHAT . 


SUBROUTINE  ARC< Y.Q.YHAT) 

INTEGER  0 

COMMON  /PRED/G,N,RMSMIN,ALP.AINV,KQ, NSPSAM, A( 12),VHAT( 12), 

1  EV, I  STATU  40) ,EP ,ALAD,SPERB1( 200 ) , SPERB2I 200) ,V( 1 2  > , SNRB( 200) 

2  , SNRQBl 200 ) 

COMMON  /RMS/RMS , NBL , IARG , ENGY1 , ENGY2 , ENGY3 , ENGY4 , SPER1 , SPER2 
COMMON  /INIT/JI 


PREDICTION. 


PRE-0. 

PREl-0. 

DO  120  I  —  1  *  N 

PRE 1-PRE 1+A( I >*V< I ) 

PRE-PRE+AI I )*VHAT(  I ) 

RMS-ALP*( RMS-RMSMIN )♦( I .-ALP  >*ABS( VHAT( I ) l+RMSMIN 

ERROR-Y-PRE 

ERRORI -Y-PRE 1 

CALL  QUANT! ERROR, EQ, IOUT) 

ISTATH IOUT )« I STAT1 < IOUT)+I 
Q- IOUT 

DO  125  I-l.N 
0-N*2- l 
V<0  )-V(0-l  ) 

VHAT( 0  >-VHAT( 0-1 ) 

VHAT< I )-PRE*EQ 
V(  l )-Y 


I 
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YHAT-VHAT< 1  ) 

C 

C 

C -  ADAPTATION. 

C 

ERR»G*EQ/RMS**2 

A( 1  )«A( 1 )+AINV"( 1 ./ALAD-1 . > 

DO  130  I-l.N 

130  A< I >«A< I >»ALAO+ERR*VHAT< 1*1 > 

C 

C 

C - UPDATE  PAST. 

C 

I RM-MOD (  IARG.NBL  ) 

ENGY1“ENGY1+Y**2 

ENGY2«ENGY2*<  ERR0R-EQ)»*2 

ENGY3-ENGY3*£RR0R**2 

ENG Y4-ENGY4+ERROR 1**2 

I F < IRM  .NE .  0JGOTO  133 

UI-JI+1 

EV-EV+ENGY1 

EP-EP+ENGY2 

I P ( ENGY3  .NE.  0. >SPER8UJI >-10. *ALOG10< ENGY1 /ENGY3 > 

IF  <  ENGY4  .NE.  0 .  > SP E RB2 < J I > *  10. *ALOG 1 0( ENG Y 1 /E NG Y4 > 

I F ( ENGY2  .NE.  0 .  ) SNRB ( 3 1 >- 10. *ALOG 10( ENG Y I /ENG Y2 ) 

I F ( E  NGY2  .NE.  0 .  ) SNRQB< J I >» 10 . *ALOG 10< ENG Y3/ENG Y2 ) 

SPER1-SPER1*ENGY3 

SPER2-SPER2+ENGY4 

ENG Y 1 "0 . 

ENGY2-0. 

ENGY3-0. 

ENGY4-0. 

133  CONTINUE 

RETURN 
END 
C 
C 

c  ..ft..*..**.....*........*.*.*..**..*..... 

C  *  * 

C  *  STATISTICS  AND  RESULTS  * 

C  *  * 

c 

c -  SUBROUTINE  INENO  WRITES  ALL  THE  RESULTS. SUCH  AS  SNR.H 

c - AND  STATISTICS. 

C 

SUBROUTINE  INENO 
REAL  PROB( 30 ) 

I NTEGER*2  STAT, F 1 . F2 , F3 ,F4 

COMMON  /PRED/G,N.RMSMIN,ALP , A I NV , KQ , NSPSAM , A< 12  > , 

1  VHAT < 12  >,EV, I STAT1< 40) , EP , ALAD , SPERB 1 <  200) ,SP£RB2( 200>,V< 12) 

2  , SNRB<  200 ) , SNRQBf 200 ) 

COMMON  /RMS/RMS, NBL , IARG , ENG Y 1 , ENGY2 , ENGY3 . ENGY4 , SPER1 . SPER2 
COMMON  /ADON/F1 . F2 . F 3 , F4 , BTI , BT2 , BT3 
1  , JCNT , STAT<  30  > , GAMAI  4 ) 

C 

C 

SNR* 10 . *ALOG10<  EV/EP ) 

SPER-10.*ALOG10(EV/SPER1 > 

SPERI"10. *ALOG10( EV/SPER2 ) 

SNRQ" 10 . *ALOG 1 0( SPER1/EP  > 

I  SUM-0 

DO  300  I 1>1 ,KO 
300  I  SUM- 1  SUM*  I  STATU  II ) 

ARG1-ISUM 
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SUM-ALOG! ARG1  ) 

DO  500  I-l.KQ 
ARG-ISTAT1I I (+0.001 
S00  SUM-SUM-!ARG*AL0G!ARG))/ARG1 
BITS-SUM/AL0G(2.  ) 

WRITE! 6. 402) SNR, 8  ITS. ( I  STATIC  I > , IM , KQ > 

402  FORMAT! 6X. 1  SNR  I NLOOP* ‘ . F7 . 3 , 3X , ' H- ‘ , F 4 . 2 , 3X . ' OP ' , 

1  1015,3! /22X, 1015 )// ) 

WRITE! 5,402  ) SNR. BITS, (  I  STATIC  I  ) . I  *  1 , KQ  > 

WRITE!  6.404  > 

404  FORMAT! ///6X, ‘  SAMPLE  NUMBER  ,.6X,'  SNR  1 .6X,1  SPER  ’,6X, 

1  1  SPERI  '  ■  6X ,  '  SNRQ  '//> 

C 

NB- I SUM/NBL 
C 

DO  408  I  *  1 , NB 

IS-< 1-1 )*NBL+1 
IE-IS+NBL-1 

WRITE! 6. 412) IS, IE.SNRB! I  ) , SPERB 1 ( I  ) , SPERB2! I > , SNRQBC I > 

408  CONTINUE 

412  FORMAT! 6X, 15. ' - '  , 1 5 . 6X , F7 . 2 . 4X , F7 . 2 , 4X , F7 . 2 , 4X , F 7 . 2  ) 

WRITE (6,416)SPER,SPER! , SNRQ 

416  FORMAT! //6X, '  PREDICTOR  PERFORMANCE  -'.F8.2/6X, 

1  '  PREDICTOR  IDEAL  PERFORMANCE-  •.F8.2/6X,'  SIGNAL  TO  NOISE 

2  RATIO-1 .F8.2) 

C 

C 

NQUA-0 

DO  418  1-1,16 

418  NQUA-NQUA+STAT! I > 

DO  420  1-1,16 

PROS! I )«FLOAT( STATC I > (/FLOAT! NQUA ) 

420  CONTINUE 

WRITEC6.422) 

422  FORMAT! //,6X, 'LEVEL  NUMBER 1 , 6X , 1 PROBABI L ITY ', 6X , 1  FREQUENCY’,/) 

424  FORMAT! 9X , 1 2 , 1 2X , F7 . 4 , 10X , 1 5 > 

DO  426  1-1,16 

WRITE! 6, 424) I ,PROB< I ),STAT! I  ) 

426  CONTINUE 

RETURN 
END 
C 

C  *»*•*»•****»«»***•*•»«»••*•»•»**»»»*••••* 

c  »  » 

C  *  BUFFER  CONTROL  * 

C  *  » 

c  •*••***••••••••••*••••••*•••**•»•***»*•** 

SUBROUTINE  BUFCTL ( 8UFCNT , J , AVGB IT ) 

INTEGER+2  STAT, F 1 , F 2 , F 3 , F 4 
COMMON  /ADDN/F1 , F2 , F3 , F 4 , BT1 .BT2.BT3 
1  ,  OCNT , STAT! 30 ) , GAMA! 4 ) 

C 

C 

IF!  FI  .EO.  I  )GOTO  50 
IF!  F 2  .EQ.  1  (GOTO  60 
IF!  F 3  .EQ.  1 (GOTO  70 
IF!  J  ,LE.  3)  GOTO  80 
STAT(0  CSTATCO )♦! 

BUFCNT-BUFCNT+4. -AVGB IT 
F4-0 

GOTO  100 
50  CONTINUE 

IF! J  .EQ.  1 (GOTO  53 
IF! J  .EQ.  2 (GOTO  153 
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53 


153 


155 


60 


63 


65 


67 


70 


72 


IFU  .EQ.  3  >GOTO  155 
STATIJ (-STATU  >+l 
STAT< 1 l-STATI 1  >  +  l 
F1«0 

BUFCNT-BUFCNT+4. -AVGB IT 
BUFCNT-BUFCNT+4 . -AVG8 IT 
F4-1 

GOTO  100 
CONTINUE 

STAT ( 12 )«STAT( 1 2  >♦ 1 
BUFCNT-BUFCNT+4. -AVGB IT-AVGB IT 
F4*0 
Fl-0 

GOTO  100 
F2-1 

STAT< 1 )-STAT( 1  >  +  l 
Fl-0 

BUFCNT-BUFCNT+4. -AVGB IT 
GOTO  100 
F  3-  1 

STAT<  1  )-STAT(  1  >  +  l 
Fl-0 

BUFCNT-BUFCNT+4. -AVGB IT 

GOTO  100 

CONTINUE 

I F <  J  .EQ.  1 1GOTO  63 
IF(  J  .EQ.  2  )GOTO  65 
IF(  J  .EQ.  3  >GOTO  67 
STAT ( 2  ) -STAT !  2  )  +  1 
BUFCNT-BUFCNT+4. -AVGB IT 
STATU  l-STATU  >  +  l 
BUFCNT-BUFCNT+4. -AVGB IT 
F4-1 
F2-0 

GOTO  100 

STAT< 1 3 ) -STAT( 1 3  !♦ 1 
BUFCNT-BUFCNT+4. -AVGB IT-AVGB IT 
F2-0 
F  4-0 

GOTO  100 

STAT ( 1 4 ) -STAT< 14)+1 
BUFCNT-BUFCNT+4. -AVGB IT-AVGB IT 
F  4-0 
F2-0 

GOTO  100 

STAT(2)«STAT(2)+1 
BUFCNT-BUFCNT+4. -AVGB IT 
F3-1 
F2-0 
F  4-0 

GOTO  100 
CONTINUE 

IF<  J  .EO.  I  )GOTO  72 

IF(  J  .EQ.  21GOTO  74 

IF(  0  .EQ.  3 (GOTO  76 

STAT(3)-STAT(3>+1 
BUFCNT-BUFCNT+4. -AVGB IT 
STATU  l-STATU  )  +  l 
BUFCNT-BUFCNT+4 . -AVGB IT 
F  3-0 
F4- 1 

GOTO  100 

STAT(15)-STAT<15)+1 
BUFCNT-BUFCNT+4. -AVGB IT-AVGB IT 


F3-0 
F4-0 

GOTO  100 

STAT < 3  >-STAT<  3  >+l 
BUFCNT-BUFCNT+4.-AVGBIT 
F2-1 
F4-0 
F3-0 

GOTO  100 

STAT ( 16  >»STAT< 16  >  +  l 
BUFCNT*8UFCNT+4. -AVGB IT-AVGB1T 
F4-0 
F3-0 

GOTO  100 
CONTINUE 

I F (  J  .EQ.  llFl-1 
I F <  0  .EQ.  2>F2-1 
IF<  J  .EQ.  31F3-1 

CONTINUE 

IF ( BUFCNT  .GT.  BT1  10CNT-2 
IF<  BUFCNT  .GT.  BT21JCNT-3 
IF  <  BUFCNT  .GT.  8T3>0CNT»4 
I F <  BUFCNT  .LT.  BTllJCNT-1 
I F ( BUFCNT  .LT.  0. >BUFCNT»0. 

RETURN 

NO 


T 
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i 


c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


c 

c 


c 

c 


c 

c 

7 


a 


c 

c 


100 
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ENTROPY  CODING 


PROGRAM  NAME!  ENCO.FTN 

DATE:  JUNE  30,1981 


THIS  PROGRAM  READS  THE  QUANTIZER  LEVELS,  SORTS  THEM  OUT  ACCORDING  TO 
RUN  LENGTH  AND  PUTS  APPROPRIATE  CODE  WORDS  IN  OUTPUT  FILE. 

I NTEGER-2  B INI  256  > , BOUT! 512 ) ,TBU( 256 ) . HD( 40 ) , BOUT1 ( 256 > 

DIMENSION  IVECT!  1 1  ) 

EQUIVALENCE  ( BOUT1 ( 1 ) , BOUT! 1  )  ) 

DATA  IVECT/0, 1,4,2,5,3,6,11,7,15,14/ 

KBIT-0 

LAST-0 

NEVBLO-0 

TYPE  TYPE  THRESHOLD  TO  SWITCH  CODE  TO  PREVENT  BUFFER  UNDERFLOW 

TYPE  *,*  TYPICAL  VALUE:  100' 

ACCEPT  * , I TH 
IBCNT-0 


OPENIUNIT-l.TYPE-'OLO' , NAME- ' QUANT . DAT '  ) 

OPEN(UNIT-2.TYPE-'NEW , NAME CODE . DAT ’ . CARRIAGECONTROL- 1 L 1ST ' > 
OPEN(UNIT-3,TYPE»'NEV  , NAME •' TBUF . DAT '  , CARR IAGECONTROL- ' L I  ST '  > 


TYPE  TYPE  HAMMING  COOEUNFO  BITS, CHECK  BITS' 

ACCEPT  -, INFOB, NPARI 


READ! I ,7>NSENT, I  RATE , NSAMP , IUPPR , I LOWR . NTERMS , HD 
FORMAT! 615, 10X, 40A1 > 

TYPE  TYPE  THE  HEADER  FOR  BUFFER  FILE:' 

ACCEPT  8, HD 
FORMAT! 40A1  ) 

WRITE! 3,7) NSENT, IRATE .NSAMP , I UPPR , I LOWR , NTERMS , HD 
TYPE  «,'  TYPE  HEADER  FOR  ENCODER  OUTPUT  FILE:' 
ACCEPT  8  HD 

WRITE! 2,7 1NSENT, I  RATE, NSAMP , IUPPR , I LOWR , NTERMS , HD 


NBLO  -  NSAMP/256 

0-0 

OO  10  IB-1, NBLO  I  BLOCK  LOOP 

REAO!  I , 100, ENO-333 >BIN 
FORMAT! 1612  ) 

IP-1 

1-0 

I-I-IP 

0-0-1 


M-B IN!  I  ) 

IF!  I  .EQ.  256  .OR. 
MI-BIN! 1*1  ) 

IF ! M . GT . 3  .OR.  Ml 
I F ( M  .EQ.  1  .AND. 
I F ( M  .EQ.  I  .AND. 


KBIT  .LE.  ITHJGOTO  60 

.GT.  3  >GOTO  60 
Ml  .GT.  3 (GOTO  60 
Ml  .EQ.  2IGOTO  60 


t 


'! 

.  t 
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IF 

!  M  . 

EQ. 

1 

.AND. 

Ml  . 

EQ. 

IF 

!  M  . 

EQ. 

2 

.AND. 

Ml  . 

EQ. 

IF 

(M  . 

EQ. 

3 

.AND. 

Ml  . 

EQ. 

IF 

!  <M 

.EQ. 

1  > 

•  AND. 

.  (Ml 

.EQ 

IF 

(  (M 

.EQ. 

2  > 

.AND. 

.(Ml. 

EQ. 

IF 

<  (M 

.EQ. 

2) 

.AND, 

.  (  Ml  . 

EQ. 

IF 

(  !  M 

.EQ. 

3  ) 

•  AND, 

.(Ml  . 

EQ. 

IF 

(  <  M 

.EQ. 

3  ) 

.AND, 

.(Ml. 

EQ. 

3  >GOTO  60 
2 (GOTO  60 


IP-2 

BOUT! J l-KCOO 
KBIT-KBIT-2 
DO  78  ILOO 


ILOOP-I ,4 

IBCNT-IBCNT+l 

IF  <  MOD! IBCNT, INFOB) 

KBIT-KBIT+NPARI 

IBCNT-0 


l  ADD  ’ NPARI ’  BITS 
I  TO  BUFFER  EVERT  ’ INFOB’ 
.NE.  0>GOTO  78 


CONTINUE 


IF! KBIT. GT. 1024 )KB IT- 1024 

IF(KBIT.LT.0 ) KB  I T-0 

IF! I . NE . 1 )GOTO  80 

TBU ( I > -LAST 

GOTO  81 

TBU! I )-TBU( 1-1 ) 

TBU (  1  +  1  l-KBIT 


GOTO  20 
IP-1 

BOUT <  0 )-IVECT(  M ) 
KBIT-KBIT* 1 


CONTINUE 


ILOOP-l ,4 
IBCNT-IBCNT+l 

I F { (MODI IBCNT, INFOB))  .NE.  0)GOTO  62 

KBIT-KBIT+NPARI 

IBCNT-0 


IFIKBIT.GT. 1024  JKBIT-1024 
IF<KBIT.LT.0) KBIT-0 
TBU( I  )-K8IT 

CONTINUE 

WRITE! 5,300)1 ,BIN< I),(I+1),BIN(I+1 ) ,0 , BOUT! 0 ) , KCOD ,TBU( 0 ) 
FORMAT! 2X , 'BIN!  ',13,'  >- ‘ ,  1 3 , 2X  ,  ‘  B I  N<  ',13,'  )•' , I3.2X,  ' BOUT! ’ , : 
I  15 ,2X,  ' KCOO- ' , I2.2X, ' KBIT- ‘ , 14 ) 

I F ! I . EQ .256  .OR.  (I.EQ.255  .AND.  IP  .EQ.  2 ) )GOTO  70 
GOTO  13 


WRITE! 3,200)TBU 
FORMAT! 1615 ) 

LAST-TBU! 256 ) 

I F ( J  .LT.  256  )GOTO  10 
WRITE! 2, 150IBOUT1 
FORMAT!  1612 ) 

NEWBLO-NEWBLO+1 

KK-U-256 

I F ! KK  .EQ.  0IGOTO  93 
DO  91  1 1-1 ,KK 
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91  BOUT( II ) -BOUTf 256  + 1 1  > 

93  0-KK 

10  CONTINUE 

C 
C 

IF  <  <3  .LT.  256)  GOTO  410 
I F ( J  .GT.  256)  GOTO  550 
GOTO  333 
C 
C 

410  LL-256-J 

DO  420  L«1  ,LL 

420  BOUT< J+L >-IVECT< 1 > 

430  WR ITE ( 2 , 1 50 ) BOUT1 

NEWBLO-NEUBLO+1 
GOTO  333 
C 

c 

550  00  560  1*1.256 

560  B0UT1 < I  )»IVECT< 1 > 

KK* J -256 
DO  5  70  1 1*1  ,KIC 

570  BOUT( I  I  )*BOUT( 256  +  1 1  ) 

GOTO  430 
C 
C 

333  CLOSE  (UNIT-1) 

CLOSE  (UNIT-2) 

CLOSE  (UNIT-3) 

C 

C 

OPEN(UNIT*l,TYPE*' NEW , NAME- ‘ NEWBLO . DAT '  ) 
WRITE(  1,666  JNEWBLO 
666  F  ORMAT (13) 

CLOSE<  UNIT-1  ) 

C 

C 

STOP 

END 


i 


1 


oo  •-  \a  ui  oo  o  nnnonnnnoonoooooononnoonoono 
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CHANNEL  ERROR 
WITH  PARITV  CHECK  AND/OR 
DOUBLE  ERROR  CORRECTION 


PROGRAM  NAME:  CHER2631 . FTN 


THIS  PROGRAM  READS  CODE. OAT  FILE  AND  INTRODUCES  RANDOM  ERRORS 
IN  THE  BIT  STREAM  ACCORDING  TO  BIT  ERROR  RATE  (BER)  IN  PERCENT. 

THEN  IT  WRITES  A  NEW  FILE  WITH  CODEWORD  AFFECTED  BY  CHANNEL  ERRORS. 
AS  OPTION.  IT  PERFORMS  PARITY  CHECK  CORRECTION  AND/OR  DOUBLE 
ERROR  CORRECTION. 

THE  FRAME  LOOP  OF  32  BITS  COULD  BE  ASSOCIATED  TO  A  (31,26) 

HAMMING  CODE  WITH  6*4  ♦  1  SYNC.  ♦  1  DOUBLE  ERR. DETECT.  -26 


I NTEGER*2  B I N( 256 ) . 80UT( 256 > , HD( 40 > . FLAG . ISAM! 30 ) , DOUB 
TYPE  *, '  ENTER  BER!*)' 

ACCEPT  * , BER 

TYPE  *,'  PARITY  CHECK  -  I  :  NO  PAR.  CHECK  •  0' 

ACCEPT  * , FLAG 

TYPE  *.'  OOU8LE  ERR.  CORRECT.  -  1  s  NO  ERR.  CORRECT.  -  0’ 
ACCEPT  * , DOUB 

IDO-0 

ICS-0 

ICD-0 

ICT-0 

ICOR-0 

I F ( BER  .LT.  1 .E-07 JBER-0.0 
EXT  «  BER/ 100. 

K 1  «  773 

K2  -  119 
CONT-0. 

I  UNO- 1 


1 


OPEN(UNIT-l,TYPE»'OLD' , NAME- ' CODE . DAT’  ) 

OPEN! UNIT-2, TYPE- 'NEW , NAME- ' ERCO. OAT ' , CARRIAGECONTROL » ' L I  ST  * > 
OPEN! UN  IT-3,  TYPE -'OLD' .NAME -'NEWBLO.DAT'  ) 

READ! 3,5  )NBLO 
FORMAT! 13) 

READ! 1 , 9 1NSENT , I  RATE .NBBS.IUPPR, ILOWR , NTERMS , HD 
FORMAT! 615, 10X.40A1  ) 

TYPE  *,'  TYPE  HEADER  FOR  CHERR  OUTPUT  FILE:' 

ACCEPT  11, HD 
FORMAT! 40A 1 ) 

WRITE! 2,9  1NSENT, IRATE ,NBLO, I UPPR, ILOWR, NTERMS, HD 


00  13  00* 1,10000 

13  CALL  RANDU! K1 , K2 , X  ) 
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DO  10  IB-1, NBLO  1  BLOCK  LOOP 

READ( 1 , 100, END-345  >BIN 
100  FORMAT (1612) 

C 

C 

DO  20  IR-1,32  1  SAMPLE  LOOP 

IKK-0 

DO  22  JG-1 , 30 

22  ISAM<OG)-0 

L-0 
C 

DO  25  IZ-1,8  1  FRAME  LOOP  8-4 

I«IZ+8*< IR-1 > 

M-B I N(  I  > 

DO  30  11-1,4 

CALL  RANDUIKl  ,K2.X> 

I F ( X  . GT.  EXT)  GOTO  30 
J  -  (2.  **  (  1 1  -  1  )  ♦  0.5  ) 

M  •  lEOR(M.J) 

L  —  L  ♦  1 
1  SAM<  L  )-I 
IKK-IKK+1 
CQNT-CONT+1 . 

30  CONTINUE 

BOUT( I >-M 

25  CONTINUE 

I F( I KK . NE . 1 >GOTO  65 
ICS-ICS+1 

I F <  FLAG . EQ .0 )GOTO  20 
I X* I SAM( 1  ) 

BOUT< I X ) «B I N (  IX) 

ICOR* ICOR* 1 
GOTO  20 
C 

65  I F ( IKK.EQ.01GOTO  20 

IF( IKK . EQ. 2  .ANO.  OOUB.EO. 1 )GOTO  75 
GOTO  80 

75  IDO-IDO+1 

I  X- I SAM(  1  ) 

BOUT ( IX )-BIN< IX ) 

IX-ISAMt  2 ) 

BOUT ( IX  >*BIN< IX  > 

80  IKK«IAND< IKK, IUNO) 

IF< IKK.EQ.0)ICD-ICD+1 
I F ( IKK.EQ. 1 1ICT-ICT+1 
I F < IKK.EQ. 1 .AND .FLAG. EQ. I >GOTO  99 
GOTO  20 
C 
C 

99  OK1-K1 

OK2-K2 

00  700  IPLUS-I ,50  I  EXTRA  LOOPS 

DO  710  IZ-1,8  l  FRAME  LOOP 

I  - IZ*8*< IR-1 > 

C 

DO  720  KF-I ,30 
I  PR- 1  SAMI KF  ) 

I F ( IPR.EO. I  )GOTO  710 
720  CONTINUE 

C 

M-BIN<  I  ) 

DO  730  11-1,4 
CALL  RANDUt  K 1 , K2 , X  > 

IF(X.GT.EXT)COTO  730 
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730 

710 

700 

C 

780 


C 

C 

20 

C 

C 

C 


150 

10 

345 


0-2. »•! 1 1 - 1 >  ♦  0.5 
M« I EOR! M  ,  J  ) 

CONT-CONT+1. 

GOTO  780 
CONTINUE 
CONTINUE 
CONTINUE 

BOUT!I)-M  1  INSERT  ONE  ADDITIONAL  ERROR  :  PARITY  CHECK  FAILS 

K1-OK1 
K2-OK2 


CONTINUE 


WRITE ( 2 , 150 )  BOUT 
FORMAT! 1612  > 


CONTINUE 
CONTINUE 
TYPE 
TYPE 
TYPE 
TYPE 
TYPE 
TYPE 


TOTAL 

TOTAL 

TOTAL 

TOTAL 

TOTAL 


OF  ERRORS-* ,CONT 

OF  SINGLE  ERRORS  CORRECTED-  *,ICOR 
OF  OOUBLE  ERRORS  CORRECTED-  *,IDO 
OF  SINGLE  ERR. /FRAME-  ',ICS 
OF  EVEN  ERR. /FRAME-  * , ICD 
TOTAL  #  OF  ODO  ERR. /FRAME-  '.ICT 
RES  I -CONT- 1 COR-2* I  DO 

TYPE  ♦  OF  RESIDUAL  ERRORS-  '.RESI 

IF ( CONT  . NE .  0. )6ERNEW-(RESI/CONT)*BER 
I F  <  CONT  .NE.  0. 1TYPE  NEW  BER  AFTER  ERR  CORRECTION 

CLOSE! UNIT-l ) 

CLOSE! UNIT-2  ) 

CLOSE  <  UN  I T-3  ) 

STOP 

END 


* .BERNEW 


I 

I 


no  ^  10  ui  o  o  o  oononoooooooooooooooooooo 


146 


CHANNEL  ERROR 
WITH  PARITY  CHECK  AND/OR 
DOUBLE  ERROR  CORRECTION 


PROGRAM  NAME:  CHER5763.FTN 


THIS  PROGRAM  READS  CODE . DAT  FILE  AND  INTRODUCES  RANDOM  ERRORS 
IN  THE  BIT  STREAM  ACCORDING  TO  BIT  ERROR  RATE  (BER)  IN  PERCENT. 

THEN  IT  WRITES  A  NEW  FILE  WITH  CODEWORD  AFFECTED  BY  CHANNEL  ERRORS. 
AS  OPTION,  IT  PERFORMS  PARITY  CHECK  CORRECTION  AND/OR  DOUBLE 
ERROR  CORRECTION. 


I NTEGER*2  B INI 2S6 > , BOUT! 2S6 ) . HD( 40 > . FLAG . I SAM( 30 ) , DOUB 
TYPE  '  ENTER  8ER!X>' 

ACCEPT  *  BER 

TYPE  -.'’parity  CHECK  *  I  i  NO  PAR.  CHECK  -  0' 

ACCEPT  -.FLAG 

TYPE  DOUBLE  ERR.  CORRECT.  •  1  :  NO  ERR.  CORRECT.  -  0’ 

ACCEPT  * , DOUB 

IDO-0 

ICS-0 

ICO-0 

ICT-0 

ICOR-0 

IF ( BER  .LT.  1 .E-07 1BER-0.0 
EXT  •  BER/100. 

K 1  -  773 

K2  -119 
CONT-0. 

I  UNO- 1 


OPEN! UNIT-I .TYPE-' OLD' , NAME- ' CODE . DAT ' ) 

OPENIUNIT-2,TYPE-'NEW' , NAME- ’ ERCO. DAT ' , CARRIACECONTROL- * L 1ST' ) 
OPENIUNIT-3, TYPE-' OLD' , NAME- ' NEWBLO. DAT '  > 

READ! 3 , 5  >NBLO 
FORMAT! 13 ) 

READ!  I  ,9  INSENT, IRATE.NBBS, IUPPR, I LOWR , NTERMS , HD 
FORMAT! 6 15 , 10X , 40A 1 ) 

TYPE  TYPE  HEADER  FOR  -CHERR  OUTPUT  FILE:' 

ACCEPT  1 1 , HD 
I  FORMAT! 40A1 ) 

WRITE! 2,9 >NSENT, I  RATE. NBLO, IUPPR, I LOWR, NTERMS, HD 


DO  13  JO- 1,10000 
13  CALL  RANOU! KI , K2 , X ) 

DO  10  IB-1, NBLO  1  BLOCK  LOOP 

READ! 1,100,END-345)BIN 
100  FORMAT! 1612) 


<J  o 
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DO  20  IR-1, 16  1  SAMPLE  LOOP 

IKK-0 

DO  22  JG-1 ,30 
22  I SAM( JG ) ”0 

L-0 
C 

DO  25  IZ-1,16  1  FRAME  LOOP  16*4 

I • IZ+ 16*< IR-1 > 

M-BIN<  I  ) 

DO  30  11*1,4 

CALL  RANDUt  KI , K2 , X  ) 

1 F <  X  .GT.  EXT)  GOTO  30 
0  -  (2.  «*  ( 1 1  - 1  )  +  0.5) 

M  »  IEOR(M.O  ) 

L-L  +  l 
ISAM(L  >-I 
IKK-IKKM 
CONT-CONT*! . 

30  CONTINUE 

BOUT< I )-M 

25  CONTINUE 

IF ( IKK.NE . 1 1GOTO  65 
ICS-ICS+1 

IF( FLAG . EQ.0)GOTO  20 
IX-ISAMt 1  ) 

80UT  < IX)-8IN< IX) 

ICOR-ICOR+1 
GOTO  20 
C 

65  I F  <  I KK . £□ .0 )GOTO  20 

I F ( 1 KK . EQ. 2  .AND.  DOUB.EQ. 1 )GOTO  75 
GOTO  80 

75  !  DO” I  DO* 1 

IX- ISAM! 1 ) 

BOUT ( IX  >-BIN< IX ) 

I X- 1 SAM( 2  ) 

BOUT< IX )«BIN< IX ) 

80  IKK-IANDl IKK, IUNO) 

I F <  IKK.EQ.0)ICD*ICD+1 
I F ( IKK.EQ. 1 > ICT* ICT+ 1 
I F ( IKK.EQ. 1 .AND. FLAG. EO. I )GOTO  99 
GOTO  20 
C 
C 

99  OK1-K1 

OK2-K2 

DO  700  I  PLUS-1 , 50  1  EXTRA  LOOPS 

OO  710  IZ-1,16  i  FRAME  LOOP 

I ■ IZ* 1 6*( IR-1  ) 

C 

DO  720  KF-1,30 
I  PR* I SAM( KF ) 

I F < IPR.EO. I (GOTO  710 
720  CONTINUE 

C 

M-BIN<  I  ) 

DO  730  11-1,4 
CALL  RANDU1K1 , K2 , X > 

IF<  X.GT.EXTIGOTO  730 
J-2 . **( 1 1 - 1  )  +  0.5 
M- 1 EOR<  M , 0  ) 

CONT-CONT+1 . 
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i 


GOTO  780 

730 

CONTINUE 

710 

CONTINUE 

700 

f* 

CONTINUE 

780 

BOUT(I)-M  l  INSERT  ONE  ADDITIONAL 

ERROR 

:  PARITY  CHECK  FAILS 

K 1 “OK  1 

C 

K2-OK2 

C 

20 

C 

c 

CONTINUE 

c 

WR I TE ( 2 , 1 50 ) BOUT 

150 

FORMAT ( 1612) 

10 

CONTINUE 

345 

CONTINUE 

TYPE  *  '  TOTAL  #  OF  ERRORS*1 ,CONT 

TYPE  *,•  TOTAL  #  OF  SINGLE  ERRORS  CORRECTED* 

,  ICOR 

TYPE  TOTAL  •  OF  DOUBLE  ERRORS  CORRECTED* 

.IDO 

TYPE  *,■  TOTAL  #  OF  SINGLE  ERR. /FRAME- 

1  .ICS 

TYPE  *.'  TOTAL  #  OF  EVEN  ERR. /FRAME-  *, 

ICD 

TYPE  *,‘  TOTAL  #  OF  OOD  ERR. /FRAME-  '.ICT 

RESI-CONT- ICOR-2* I  DO 

TYPE  *,•  #  OF  RESIDUAL  ERRORS-  ’.RESI 

I F ( CONT  .NE.  0. >BERNEW-( RES I/CONT )*BER 

IFfCONT  .NE.  0.JTYPE  NEW  BER  AFTER 

CORRECTION-  1 .BERNEW 

CLOSE! UNIT-1 ) 

CLOSE (UNIT-2) 

CLOSE ( UNIT-3 ) 

STOP 

END 

'  i 

1 1 
.  * 

?! 

n 

* i 

il 

'! 


i»  t .ijt .  <-  „-m:«  - 


I 
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c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 

C  THIS  PROGRAM  READS  CODEWORD  WITH  CHANNEL  ERRORS  AND  DECODES  THEM 

C  ANO  WRITES  CORRESPONDING  QUANTIZER  LEVELS  IN  A  NEW  FILE.  IT  ALSO 

C  COMPUTES  RECEIVER  BUFFER  OCCUPANCY. 

C 

REAL  SAMP ! 1024). LAST 

INTEGER-2  B  IN<  256  ) .  BOUT<  1024  ) ,  I  SAMP<  256  )  ,  FNAME!  1 6  > .  KARI  2  > .  BOUTI  <  256  ) 
INTEGER-Z  HD<  40  ) , HD  1 <  40 ) 

EQUIVALENCE  ( BOUT< 1 > , BOUT  1 ( I) ) 

C 

TYPE  *,'  ENTER  THE  COOEWORD  FILENAME  FOR  DECODER: ' 

ACCEPT  4 , F NAME 

4  FORMAT! 16A2 ) 

C 

OPEN(UNIT-l .TYPE-'OLD’  .NAME-FNAME  ) 

OPEN(UNIT*2, TYPE-' NEW' . NAME «' DECO . DAT 1 .CARRIAGECONTROL- ' L 1ST'  > 
OPEN1UNIT-3.TYPE-’ NEW 1  . NAME «' RBUF . DAT '  , CARRIAGECONTROL- ' L I  ST  1  ) 
OPEN(UNIT*4,TYPE"'OLO' . NAME •' NEWBLO . DAT  1  ) 

READ! 4,5)NBLO 

5  FORMAT! 13) 

C 

C 

READ! 1.9  )NSENT, IRATE. NBOLD, IUPPR, ILOWR.NTERMS.HD 
NSAMP-NBLO-256 
9  FORMAT ! 615. 10X  . 40A 1 ) 

TYPE  TYPE  HEADER  FOR  DECODER  OUTPUT  FILE:' 

ACCEPT  1 1. HO 
11  FORMAT! 40A1 > 

WRITE! 2.9 JNSENT. IRATE , NSAMP . IUPPR , ILOWR . NTERMS . HD 
TYPE  *,'  TYPE  HEADER  FOR  RECEIVER  BUFFER  OUTPUT  FILE:' 

ACCEPT  1 1 ,  HD  1 

WRITE! 3,9 >NSENT, I  RATE, NSAMP. IUPPR, ILOWR, NTERMS, HD1 
C 
C 

TYPE  INITIALIZE  THE  RECEIVER  SAMPLE  BUFFER:' 

ACCEPT  -.WORD 

C  LAST-300.  1  RECEIVER  BUFFER  INITIALIZ. 

C  WORD-300.  1 . .  *• 

LAST-WORD 
IND-1 
I  END-0 
NEW8-0 
C 
C 

DO  10  IB-l.NBLO 

REAO! 1 . 100.ENO-345)BIN 


FORMAT! 1612) 

DO  20 

1-1.256 
L-BIN! I  ) 

IF!  L 

.EQ. 

8JG0T0  35 

IF!  L 

.EQ. 

10)GOTO  40 

IF!  L 

.EQ. 

1 3  )60T0  45 

I F  (  L 

.EQ. 

9  )GOTO  50 

IF!  L 

.EQ. 

1 2  >GOTO  53 

****#**#*******»*#*ir^' 

DECODER 


PROGRAM  NAME:  DEC03.FTN 


150 


f 


35 


40 


45 


50 


53 

C 

C 

C 

C 

60 


80 

81 

C 

C 

20 

C 

C 

149 

150 

C 

55 

151 
C 


IF!  L 

.EQ. 

0  JBOUT ! IND)-1 

I FI  L 

■  EQ. 

1 JBOUT! IND  )”2 

I F(  L 

.EQ. 

2  )80UT! IND  )-4 

IF!  L 

.EQ. 

3  )BOUT ( IND  >-6 

IF!  L 

.EQ. 

4  ) BOUT!  IND  >-3 

IF!  L 

.EQ. 

5  JBOUT! IND  )«5 

IF!  L 

.EQ. 

6  ) BOUT ( INO  > -7 

I F  (  L 

.EQ. 

7  )BOUT! IND  >-9 

IF!  L 

.EQ. 

11 JBOUT! INDJ-8 

IF!  L 

.EQ. 

14  1B0UT! IND  )-l 1 

IF1L 

.EQ. 

15  JBOUT! IND )-10 

WORD-VORD+0. 25-0. 3333  1 1 SAMPLE/4BITS-0. 25 

I F ( WORD  .GT.  350. JVORD-350. 

1 F ( WORD  .LT.  0. JWORD-0. 

SAMP ( I  NO) ‘WORD 
I  NO* I ND  + 1 
GOTO  20 
KAR! 1  )-l 
KAR< 2  ) - 1 
GOTO  60 
KAR! 1 >-2 
KAR<  2  ) - 1 
GOTO  60 
KAR( 1  )»2 
KARI2J-2 
GOTO  60 
KAR! 1  )-3 
KARI2J-1 
GOTO  60 
KAR< 1  )-3 
KAR!  2  )-3 


BOUT! 1  NO  >«KARI 1 > 

80UT< INO+1 )-KAR!2) 

WORO*WORD+0. 5-0.3333  1 2SAMPLES/4B ITS-0. 5 

1 F  <  WORD  .GT.  350.  JWORD-350. 

IFIVORD  .LT.  0. JWORD-0. 

IF( IND  .NE .  1 JGOTO  80 
SAMP I INDJ-LAST 
GOTO  81 

SAMP ( IND  )-SAMP( IND-1 > 

SAMP  < I N  D ♦ 1 >»WORD 
IND»IN0*2 


CONTINUE  1  SAMPLE  LOOP 


WR I TE  <  2 , 1 50 ) BOUT 1 
FORMAT! 1612) 

NEWB-NEWB+1 

DO  55  I  - 1 ,256 
I  SAMP ( I )*SAMP< I >*0.5 
WRITE! 3. 151 ) I  SAMP 
FORMAT! 1615) 

IF! INO  .LE.  257)  GOTO  75 
IF! I B . EO . NBLO ) I  END" 1 

IF! IND  .GT.  1024 )STOP  'BUFFER  TOO  SHORT’ 
KK* I  NO-257 


151 


70 


75 

C 

C 

10 

C 

c 

c 

907 

C 


301 

C 

C 

C 

C 

430 


900 


345 


888 


DO  70  I-l.KK 
BOUT! I l-BOUT! 256*1) 

SAMP< I  >-SAMP! 256+1  > 
IND-KK+1 

I F ( INO  .GT.  257  >GOTO  149 

GOTO  10 

IND-1 

LAST-SAMP  1256) 


CONTINUE  1  BLOCK  LOOP 


IF( IND  .LT.  257  .AND.  TEND  .EO.  0)GOTO  345 
WRITE! 5 , 987  > I ND 

FORMAT! 1H  SAMPLES  IN  THE  TAIL  -  ',I6> 

IF! IND  .G£.  257  >GOTO  430 

DO  301  I -IND. 256 

BOUT1D-1  l  FILL  WITH  SILENCE 

WORD-WORO+0. 25-0. 3333 
IF (WORD  .GT.  350. JWORD-3S0. 

IF (WORD  .LT.  0. JWORD-0. 

SAMP!  D-WORO 


WRITE! 2, 1 50) BOUT 1 

DO  900  3-1,256 

ISAMP! J )-SAMP( J  >+0.5 
WRITE!3,15l ) I  SAMP 

NEWB-NEWB+ 1  I  FINAL  #  OF  BLOCKS 

CLOSE! UNIT-1  ) 

CLOSE! UNIT-2 ) 

CLOSE! UNIT-3  ) 

CLOSE! UNIT-4  > 

NSAMP"NEWB*256 

OPEN!UNIT«l, TYPE- ' NEW , NAME-'RECSAM. DAT ' .CARRIACECONTROL- ■ LIST 1  ) 
WRITE! 1 ■ 888 )NSAMP 
FORMAT! 15) 

STOP 

END 
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C 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


c 

c 


10 

c 

c 


c 

c 

20 

c 


888 

c 


22 


TDHS  -  ARC  SYSTEM 

RECEIVER 

PITCH  IS  EXTRACTED  AT  THE  RECEIVER 

QUANTIZER  OUTPUT  FOR  FREQUENCV  DIVIDED  SPEECH  IS  RECEIVED.  IT  IS 
INVERSE  QUANTIZED  AND  PASSED  THROUGH  ARC  RECEIVER  WHICH  GIVES 
RECONSTRUCTED  FREQUENCY  DIVIDED  SPEECH  ( YHAT  > .  FREQUENCY  MULTI¬ 
PLICATION  OPERATION  IS  PERFORMED  ON  YHAT  TO  GET  SHAT.  TO  DO  THIS 
PITCH  PERIOD  IS  NEEDEO,  THE  VALUES  OF  WHICH  ARE  READ  FROM  PITCH . DAT 
FILE.  PARAMETERS  AT  TRANSMITTER  AND  THAT  AT  RECEIVER  ARE  THE  SAME 
AND  ARE  READ  FROM  PARAMETER  FILE. 

PROGRAM  NAME:  RECVR.FTN 
DATE  JUNE  30.  1981 

THIS  PROGRAM  ASKS  FOR 

1.  PARAMETER  FILENAME:  PARA . DAT 

2.  PITCH  ESTIMATOR  OPTION. 

3.  KBLK , ITMI N ,  ITMAX 

AND  PRODUCES  OUTPUT  FILE 

1.  SHAT.DAT  (OUTPUT  SPEECH) 

2.  ZHAT.DAT  (RECONSTRUCTED  COMPRSD  SP ) 

INTEGER  HD( 40) ,Q( 7800 ) .FNAME 1 ( 1 6 ) . P ITCH( 400 ) , P 1 , P2 , P3 , P4 , SQ( 16) 
INTEGERS  FNAME2(  16  ) 

DIMENSION  YHAT ( 364 ) , SHAT( 364  > ,H( 400) , IZHAT( 256  > 

DIMENSION  I  BUFF  1 (51 2  )  . IBUFFZ(512) 

COMMON  /PRED/G.NO, RMSMIN ,ALP , AINV , KQ, NSPSAM . A( 12), D VHAT ( 1 2  ) . EV , 

ISTAT1 <  40 ) , EP ,ALAD 

COMMON  /RMS/RMS 

COMMON  /CL IPP/CLPP 

MOON ( K )  -  K  -  (K-l  ) / 1 6  *  16 

MOOM(K)-  K  -  (K-D/256  *  256 


0PEN(UNIT-8,TYPE-'0LD' , NAME- ' OPT2 . DAT ' ) 
REA0( 8 , 10) FNAME 1 
FORMAT ( 1 6A2  ) 


OPEN( UNIT-1 ■  TYPE- ' OLD ' . 
OPEN( UNIT-3 ,  TYPE*' NEW 1 , 
OPEN( UNIT-4 ,  TYPE- ‘ OLO 1 , 


NAME- ' DECO . DAT’ ) 

NAME -'ZHAT.DAT’ .CARRIAGECONTROL- ' L 1ST ’  > 
NAME-FNAME 1 ) 


REA0(  1 , 20  >NSENT ,  IRATE  .  NSAMP  ,  I UPPR ,  ILOWR,  NTERMS ,  HD 
FORMAT! 615, 10X.40A1 ) 

OPEN(UNIT-2.TYPE-'OLD’ , NAME- ' RECSAM. DAT ’ ) 

READ! 2 . 888  )NSAMP 
FORMAT! 15) 

NSPSAM  -  2  *  NSAMP 

TYPE  *.  '  TYPE  THE  HEADER  FOR  RECEIVER  OUTPUT  FILE:’ 
ACCEPT  22. HO 
FORMAT! 40A1 ) 

WRITE! 3.20 >NSENT, IRATE , NSAMP , I UPPR , I LOWR , NTERMS . HD 


:  t 


i 


j 


c 

c 
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READ! 1 ,29. END-40) !Q(0  > ,0-1 .NSAMP ) 

29  FORMAT! 1612) 

30  FORMAT!  1615  ) 

40  CONTINUE 

C 

C 

c 

READ< 8 , *  )CLPP 
READ! 8 , *  > IOPT2 
READ!8,*)KBLK1,ITMIN, UMAX 
C 
C 

c 

CALL  INSTRT 
NRMNG-MOD< NSAMP. 256) 

DO  65  I8GN-1 .NSAMP 
CALL  ARCR!  Q( IBGN).VY) 

I R 1 -MODM< IBGN ) 

IF(YY.GT.0. 0  > Y Y« Y Y*0. 5 

I F  < YY.LT.0.0JYY-YY-0.5 

IF ( YY . GT . 2047 .0)YY- 2047.0 

IF( YY.LT. -2048.0 JYY--2048.0 

I F <  IBGN . NE . 1  .ANO.  I R 1 . EQ . 1 1WRITE! 3 , 30 > IZHAT 

I ZHAT  < IR1 )-IFIX(YY> 

I F ( IBGN.EO.NSAMP )WR ITE ( 3 . 30 > ( IZHAT( K > , K-l , NRMNG > 

65  CONTINUE 

REWIND  3 

READ! 3, 201NSENT. IRATE, NSAMP, IUPPR . ILOWR , NTERMS , HD 
NSPSAM-2-NSAMP 
C 
C 

DO  665  13-1,300 

0(  1 3  )-0 

665  CONTINUE 

C 

C 

READ! 3. 30, ENO-68 )(0<0 ),0-l .NSAMP > 

68  CONTINUE 

CLOSE! UNIT-3 ) 

OPEN<UNIT-3, TYPE*' NEW' , NAME- ' SHAT. DAT ' .CARRIAGECONTROL- ' L 1ST  * ) 
WRITE! 3 ,20 )NSENT, IRATE, NSP SAM, IUPPR, ILOWR, NTERMS, HD 
C 

C  Q  BUFFER  HAS  THE  RECEIVED  COMPRESSED  SPEECH.  PITCH  WILL  BE 

C  EXTRACTED  FROM  IT  AND  WILL  BE  USED  FOR  EXPANSION. 

C 

c 

NUMP  -  0 
I  OFF  -  0 
ICNTT  •  0 
00  -  1 
NADO  •  0 

NUM1  «  K8LK1  ♦  ITMAX 
C 
C 

DO  960  1-1 ,NUM1 

I  BUFFI !  I )  -  0! 300*1 ) 

960  CONTINUE 

IOFF  -  I  OFF  ♦  NUM1 
ICNTT  •  IOFF 
NS  I NB-NUM1 
998  CONTINUE 

CALL  PICH! IBUFF 1 , NPR , ITMAX .ITMIN.KBLKl, NUMP , NSINB , I0PT2 ) 

P ITCH! 00 )  •  NPR 

JJ  a  ♦  1 


154 


C 

c 

980 

990 

C 

9100 

9110 

999 

C 

C 

80 

83 

100 


NADD  «  NADD  +  NPR 
NUM2  *  NUM1  -  NPR 


DO  980  I  «  l , NUM2 

IBUFF2! I >  -  IBUFF 1 ( NPR+I > 

CONTINUE 

NSP  -  NUM1  -  NUM2 
DO  990  I  »  1,  NSP 

ICNTT  ■  ICNTT  +  1 
I F !  ICNTT  .GT.  NSAMP  >  GOTO  999 
IBUFF2! NUM2+I >  ■  Q( 300+IOFF+I ) 

CONTINUE 
NSINB-NUM2+NSP 
IOFF  -  IOFF  ♦  NSP 

CALL  P ICH( IBUFF2.NPR, ITMAX , ITMIN , KBLKl ,NUMP , NSINB, IOPT2) 

PITCH(UO)  -  NPR 

J J  *  00  ♦  1 

NADD  -  NADD  ♦  NPR 

NUM2  »  NUM 1  -  NPR 


DO  9100  I-1.NUM2 

IBUFF 1 < I  )  -  IBUFF2! NPR+I > 

CONTINUE 

NSP  -  NUM1  -  NUM2 
00  9110  I « 1 , NSP 

ICNTT  «  ICNTT  +  1 

IF( ICNTT  .GT.  NSAMP )GOTO  999 

IBUFF 1 ( NUM2+ I )-0( 300+IOFF+I > 

CONTINUE 

IOFF  -  IOFF  +  NSP 
NSINB-NUM2+NSP 
GOTO  998 
CONTINUE 


P2  «  300 

DO  500  ICNT  -  l.NUMP 

NP  -  PITCH! ICNT) 

PI  »  P2  -  NP 

P3  •  P2  ♦  NP 

P4  «  P3  ♦  NP 

P2  »  P3 

DO  80  I  -  0,  (P4-P1-1  ) 

IP11  ■  I  ♦  PI  ♦  1 
I F (  IP  11  .GT.  NSAMP)  GOTO  600 
YHAT< 1*1 >  *  Q( I P 1 1  ) 

CONTINUE 

CONTINUE 

NP2  «  NP  ♦  NP 

DO  100  I  >  1 ,  NP2 

H(I)  -  1  -  FLOAT! 1-1 >/FLOAT( NP2-1 > 

CONTINUE 

DO  120  I  -  1,  NP2 
II  •  I 

SHAT!  1 1)  -  YHATIII)  ♦  Hill)  •  !YHAT!II+NP>  -  YHATUI)) 
IARG  •  NADO  +  I 
N 1  •  MOON!  IARG) 

XX  ■  SHAT! 1 1  ) 

IFIXX  .GT.  0.0)  XX-XX*0.5 
IF! XX  .LT.  0.0)  XX-XX-0.5 
IFIXX  .GT.  2047 ,0)XXS2047 .0 
IFIXX  .LT.  -2048.0)  XX--2048.0 


t 

ii 


I 

I 

I 

I 


I 
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120 

500 

C 

C 

600 

775 
774 

776 


777 

778 


C 

C 

C 

c 

c 

c 


c 

c 

c 

c 

c 

c 

c 

c 

c 


1 


c 


SQ( N 1 )  -  IFIX(XX) 

I F <  N 1  .EQ.  1 6  >  WRITE ( 3 , 30)SQ 

CONTINUE 

NAOO  «  NAOO  ♦  NP  ♦  NP 
CONTINUE 


TYPE  TYPE  THE  HEADER  FOR  PRINTOUT.  (40  CHAR  ONLY)’ 

ACCEPT  775, HD 
FORMAT ( 40A1 ) 

WR ITE<  6 , 774  )HD 
FORMAT  < /// , 40A 1 , // ) 

WRITE! 6,776  ) 

F ORMAT ( ////6X, ’  SAMPLE  NUMBER  ',6X.'  PITCH  PERIOD  '//) 
ISTRT  «  1 

I  END  -  K8LK1  + ITMAX 
DO  777  I-l.NUMP 

IP-PITCH!  I  ) 

WRITE (6. 7 78) ISTRT, I  END. IP 
ISTRT-ISTRT+IP 
I  END  -IEN0+IP 

CONTINUE 

F  ORMAT  <6X,I5,1X, , 1 X , 1 5 , 1 0X , 1 3 ) 

STOP 

END 


INVERSE  QUANTIZER 


SUBROUTINE  I NVCJUA!  QO ,  EQ  ) 

INTEGER  QQ 

COMMON  /QUAN/T! 20) ,OUT< 20) . EXPN( 20 ) , S IZE , SMI N . NQ 

COMMON  /RMS/RMS 

ISIGN-t 

IF<MOO(QQ,2 ).EQ.0)ISIGN--1 
J"!  QQ*2  )/2 

EQ-ISIGN*OUT(J  J-SIZE 
SIZE-EXPN! JI-SIZE 
SIZE*AMAXI (SIZE,RMS*SMIN) 

RETURN 

END 


INITIALIZATION 


SUBROUTINE  INSTRT  INITIALIZES  THE  PARAMETERS. 

SUBROUTINE  INSTRT 

COMMON  /OUAN/T< 20) ,OUT< 20 ) , EXPN<  20) . S IZE , SMI N , NQ 
COMMON  /PRED/G,N,RMSMIN,ALP.AINV,KQ, NSPSAM, A< 12 ) ,VHAT( 12),EV 
, ISTAT1(40),EP,ALAD 
COMMON  /RMS/RMS 

READ( 4 , * )A I NV , ALP , ALAO . G , N , RMSMI N , SMIN 


WRITE( 6 , 2 )AINV, ALP . ALAD , G , N , RMSMIN , SMIN 
2  FORMAT! /6X , ' AINV- ' . F5 . 2 , 2X , ' ALP- 1 , F5 . 2 , 2X , 'ALAD-' , F5 . 2 , 2X , ' G- 1 
1  F5 . 3 , 2X , ' N* ' , 12, 2X, -RMSMIN-' .F5.1.2X, 'SMIN-' , F5 .2/  ) 

READ( 4,*  >KQ 
WRITE! 6 ,3 )KQ 


! 


nnnnnnnn  oo  oooo  oooo  oooooooo 
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3  FORMAT! 6X, 'NUMBER  OF  QUANT  LEVELS" ' , 12  > 
NQ*KQ/2 

N00Q-NQ+1 

READ( 4 , *  K  EXPN! I ) . 1*1 ,NQQQ  ) 

WRITE! 6. 4 )( I ,EXPN< I ) , 1-1 , NQQQ  ) 

4  FORMAT! 6X.6!  ’EXPN!  ' , 12, '  )•'  , F6.2.2X)  > 
READ! 4 , * ) ! OUT! I ) . 1*1 , NQQQ > 

WRITE! 6 .5  X  I , OUT!  I) . I -I , NQQQ! 
d  FORMAT! 6X.6! 'OUT!  ,I2,')-',F6.2,2X>> 
SIZE-100. 

DO  118  1-1,12 
VHAT ( I >-0. 

113  A! I ) *0 . 

RMS-RMSM1N 
A! 1 l-AINV 
RETURN 
END 


ARC  RECEIVER 


SUBROUTINE  ARCR! Q, VHAT1 ) 

INTEGER  Q 

COMMON  /PRED/G,N.RMSMIN,ALP,AINV, KQ, NSPSAM, A< 12), VHAT!  12), 
1  E V , I STAT 1 ( 40 ) , EP , ALAD 
COMMON  /RMS/RMS 

--  PREDICTION. 


PRE-0. 

DO  120  I  * l , N 

120  PRE-PRE+A!  I >*VHAT< I  ) 

RMS-ALP*! RMS-RMSMIN )♦! I .-ALP )«ABS! VHAT! 1  )  >+RMSMIN 
CALL  I NVQUA! Q , EQ ) 

DO  125  I  *  1  ,  N 
0-N+2-I 

125  VHAT! J )"VHAT!0-1  ) 

VHAT! 1  l-PRE+EQ 
VHAT1-VHAT! 1  ) 


-  ADAPTATION. 

ERR-G*EQ/RMS**2 

A! 1 )*A! 1 ) *A I NV*I 1 ./ALAD-l . ) 

DO  130  I  *  1 , N 

130  A! I  )«A( 1  )*ALAD»ERR*VHATI 1  +  1 ) 


RETURN 

END 


PITCH 


EXTRACTION 


SUBROUTINE  PICH!  IBUF,  NP ,  UMAX,  ITMIN,  KBLK1 ,  MUMP , NBUFF , IOPT2 ) 
DIMENSION  A<  200) ,  IBUFF(512),  IAI200),  IBUF  t  S 1 2  ) 

COMMON  /CL IPP/CLPP 

NUMP  «  NUMP  ♦  1 
N 1  -  NBUFF/3 
N2  »  N1  +  N 1 
DO  5  I  -  1, NBUFF 

I  BUFF ( I  )« IBUF  < I > 

CONTINUE 

DO  10  I  »  1,  200 
A( I  )  -  0.0 
IA( I )  •  0 
CONTINUE 

CHECK  IF  CENTER  CLIPPING  IS  ASKED  FOR. 

I F ( IOPT2  ,LE .  2  >  GOTO  280 

FIND  OUT  ABSOLUTE  MAXIMUM  OUT  OF  NBUFF  SAMPLES  IN  THE  BUFFER* IBUFF ' 

IBIG1  -  0 
IBIG2  *  0 
DO  120  I  »  1 , N 1 

I SPABS*ABS( IBUFF( I > 

IFIISPABS  .GT.  IBIG1 JIBIGl-ISPABS 

CONTINUE 

DO  123  1  -  N2, NBUFF 

ISPABS-ABS! I  BUFF < I  )  ) 

IFIISPABS  .GT.  IBIG21IBIG2-ISPABS 

CONTINUE 

IB  -  IBIG1  -  IB1G2 
I F  <  IB  .GE.  0  )  I B I G* I B I G2 
I F <  IB  .LT.  0 ) IB IG*I BIG1 

ENTER  THE  CLIPPING  LEVEL. 

CL  ■  CLPP  •  FLOAT! IBIG ) 

CHECK  IF  CENTER  CLIPPING  WITH  THREE  OR  TWO  VALUES  IS  REQUIRED. 

I F (  IOPT2  .GT.  4)  GOTO  155 
CUPP  THE  SPEECH  WAVEFORM. 

CLM— CL 

DO  140  1-1, NBUFF 

XFLT-FLOAT< IBUFF! I > ) 

I F { ! XFLT  .LE.  CL)  .AND.  ! XFLT  .GT.  CLM  )  )  IBUFF! I )-0 

CONTINUE 
GOTO  280 
CONTINUE 

IF! IOPT2  .GT.  6 1GOTO  360 
CLM--CL 

DO  160  I • 1 , NBUFF 

XFLOT “FLOAT! IBUFF! I >  ) 

IFKXFLOT  .LE.  CL)  .AND.  (XFLOT  .GT.  CLM )  )  IBUFF!  I  >»0 
I F ( XFLOT  .GT.  CL  )IBUFF(  I)-M 
IF ( XFLOT  .LE.  -CL  )  IBUFF! I  )■-! 


160 


CONTINUE 
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C 

C 

C 


1 

1 

170 


180 


190 

200 


C 

C  — 
220 


1 

1 

240 

230 

250 

260 

280 


290 

3  00 

310 

320 

340 


IF ( I0PT2  .GT.  5 )GOTO  220 

COMPUTE  AUTOCORRELATION  FUNCTIONS 

IBIGG--1000 

DO  180  IT-ITMIN. ITMAX 
1SUM-0 

DO  170  0-1 , KBLK1 

IF! <  < IBUFFI J*  IT ) .LT.0) .AND. ( IBUFFI J  > . LT.0  > >  .OR.  IIIBUFFIJ+IT) 
.GT.0) .AND. < I  BUFFI J ) .GT.0)  )  )  I  SUM- 1  SUM* 1 

I F ( ( ( IBUFF< J+IT) .LT.0) .AND. ( IBUFFI 0 ) .GT.01  >  .OR.  ( ( IBUFF< O* IT ) 
.GT.0)  .AND. ( I  BUFFI  0  1.LT.0) ) > I  SUM- 1  SUM-1 
CONTINUE 
I A ( IT  )»I SUM 

IF<  I  BIGG  .LT.  I A  < IT  >  >  IBIGG-IAIIT) 

CONTINUE 

DO  190  I-ITMIN. ITMAX 

IF <  IBIGG  .EQ.  IA< I ) 1G0T0  200 

CONTINUE 

NP-I 

WR I TE ( 5 , * )NP 
RETURN 

CALCULATE  AM OF  FUNCTIONS 
CONTINUE 
I  SMALL -4096 

DO  230  IT-ITMIN, ITMAX 
I  SUM-0 

DO  240  U" 1 , KBLK1 

I F ( ( < I  BUFFI J  +  IT). EQ.0). AND. ( IBUFFI J ) . NE .0) )  .OR. 

I  I  I  BUFF <0  + IT)  .NE.0) .AND. ( I  BUFF ( J ) . EO.0 ) ) ) I  SUM- 1  SUM* 1 
IF( <  < IBUFFIO*IT).GT.0).AND.I IBUFFI J  >. LT.0 ) )  .OR. 

I  I  I  BUFFI J* IT) .LT.0). AND. I IBUFFI J  > .GT.0) ) (ISUM-ISUM+2 

CONTINUE 
IAI IT)»ISUM 

IF  I  I  SMALL  .GE.  IAI IT) 1ISMALL-IAI IT) 

CONTINUE 

DO  250  I-ITMIN, ITMAX 

I F I  I  SMALL  .EQ.  IAI I ) >GOTO  260 

CONTINUE 

NP-I 

WRITE! 5,* )NP 

RETURN 

CONTINUE 

IF (I IOPT2.EQ.2).OR.( IOPT2.EQ.4) >GOTO  340 

DO  300  IT-ITMIN, ITMAX 
SUM-0.0 

DO  290  J-l , KBLK1 

SUM- SUM* FLOAT! IBUFFI 0+IT) >«FLOAT! I  BUFFI J ) ) 

CONTINUE 
A( IT  )-SUM 

I F I  BIG  . LT .  A! IT)  )BIG-A! IT) 

CONTINUE 

DO  310  I-ITMIN, ITMAX 

IF! BIG.EQ.AI I ) >GOTO  320 

CONTINUE 

NP-I 

WRITE! 5 , *  )NP 
RETURN 
CONTINUE 
SMALL  ■  1.0E+09 
DO  60  IT  -  ITMIN,  ITMAX 
SUM  -  0.0 


o  o 


159 


50 

60 

70 

80 

360 

C 

380 


400 

420 

440 

460 

480 

490 

S00 

520 

540 


OO  50  O-l,  KBLK1 

CONTINUE*^"  '  SUM  *  ABS<FL0AT(IBUFF(J  +  IT>  -  IBUFF( J  >  >  > 
A( IT)  -  SUM 

IF(  SMALL  .GE.  A(IT>>  SMALL  •  A<IT) 

CONTINUE 

DO  70  I  -  ITMIN ,  ITMAX 

CONTINUE IF<SMAU  A<1>>  G0T°  80 

CONTINUE 

NP  •  I 

WR ITE<  5 , *  )NP 

RETURN 

CONTINUE 

CLIPP  THE  SPEECH  TO  TVO  VALUES. 

CL«0.0 

DO  380  I  «  I.NBUFF 

XFLOT-FLOAT< IBUFF(  I  )  ) 

IF<  XFLOT  .LE.  CL  > I BUFF( I  )  —  1 

CONTINUE1''  XFL°T  -GT-  CL  >IBUFF(I>-> 

IF ( IOPTZ  .GT.  7 )GOTO  480 
IBIGG  «  -1000 
DO  420  IT-ITMIN, ITMAX 
I  SUM-0 

DO  400  0-1 , KBLK1 

,EQ-  IBUFF<J>>ISUM-ISUM+1 
CONTINUE1'  IBUFF<J+IT  'NE‘  IBUFF< J > )!SUM-ISUM-1 
IA< IT)-ISUM 

I F (  IBIGG  .LT.  I A < IT) >  IBIGG- IA( IT) 

CONTINUE 

00  440  I-ITMIN, ITMAX 

continue1''1815*5  -EQ'  IA(I,,G0T0 

NP-  I 

WR ITE<  5 , *  )NP 
RETURN 
CONTINUE 
ISMALL-4096 

DO  500  IT-ITMIN. ITMAX 
I  SUM-0 

DO  490  0-1 , KBLK1 

CONTINUE1"  IBUFF<J*m  *NE*  IBUFF(0))ISUM-ISUM*2 
IA( IT )-ISUM 

CONTINUE*"  "GE"  IA<  IT>  >ISMALL-IA<  IT) 

DO  520  I-ITMIN. ITMAX 

CONTINUE*"  ISMALL  "EQ-  IA(I,>50T°  B<* 

NP  - 1 

WR ITE<  5 , *  )NP 

RETURN 

END 
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